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GENES ENCODING OLFACTORY RECEPTORS AND BIALLELIC MARKERS 

THEREOF 

FIELD OF THE INVENTION 

The present invention pertains to a purified or isolated nucleic acid comprising ten open 
5 reading Frames (ORFs) encoding ten different olfactory receptor-like proteins, non-coding regions 
flanking the ORFs as well as fragments thereof. The invention also provides recombinant expression 
vectors and recombinant cell hosts containing a nucleic acid encoding said olfactory receptor 
proteins. The invention also concerns the olfactory receptor proteins encoded by these ORFs as well 
as polypeptides that are homologous to said olfactory receptor proteins and the peptide fragments of 

10 both the olfactory receptor proteins and their homologous polypeptide counterparts. The invention 
also deals with antibodies directed specifically against such polypeptides that are useful as 
diagnostic reagents. The invention further encompasses biallelic markers of the olfactory receptor 
gene useful in genetic analysis. The invention also deals with methods and kits for the detection of 
the olfactory receptor proteins and with methods and kits for screening ligand molecules binding to 

15 these proteins. 

BACKGROUND OF THE INVENTION 

Throughout this application, various bibliographic publications are cited. Full bibliographic 
references for these publications may be found at the end of this application, preceding the sequence 
listing and the claims. 

20 OLFACTORY SYSTEM 

The olfactory receptor cells, the first cells in the pathway that give rise to the sense of smell, 
lie in a small patch of membrane, the olfactory epithelium, in the upper part of the nasal cavity. 
These cells are specialized afferent neurons that have an enlarged extension analogous to a dendrite. 
Several long hairlike processes extend out from this extension along the surface of the olfactory 
25 epithelium where they are bathed in mucus. The hairlike processes contain the receptor proteins for 
olfactory stimuli. The axons of these neurons form the olfactory nerve. 

For the detection of an odorous substance which is called an odorant, molecules of the 
substance must first diffuse into the air and pass into the nose to the region of the olfactory 
epithelium. Once there, they dissolve in the mucus that covers the epithelium and then bind to 
30 specific receptor proteins on the cilia. 

Although there are many thousands of olfactory neurons, each contains one, or at most a 
few, of the 1,000 or so different receptor types, each of which responds only to a specific chemically 
related group of odorant molecules. Each odorant has characteristic chemical groups that distinguish 
it from other odorants, and each of these groups activates a different receptor type. Thus the identity 
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of a particular odorant is determined by the activation of a precise combination of receptors, each of 
which is contained in a distinct group of olfactory neurons. 

The axons of the olfactory neurons synapse in the brain structures known as olfactory bulbs, 
which lie on the undersurface of the frontal lobes. Axons from olfactory neurons sharing a common 
5 receptor specificity synapse together on certain olfactory-bulb neurons, thereby maintaining the 
specificity of the original stimuli. 

OLFACTORY RECEPTORS 

In contrast with the immunoglobulin system, the diversity of olfactory receptors is encoded 
by a large germ-line repertoire of olfactory receptor genes. The size of the olfactory receptor gene 
10 family in the human genome is unknown but it has been estimated to encompass 200 to 1 ,000 genes. 

The locations of only a few human genes have been determined to date. The picture that has 
emerged so far is that several large clusters of olfactory genes and pseudogenes span hundreds of 
kilobases on several chromosomes. Using FISH analyses, more than 25 distinct locations of 
olfactory receptors gene have been identified in the human genome. 
15 In mammals, the olfactory epithelium appears to be organized into distinct topographic 

regions or zones in which expression of a particular receptor gene appears to be restricted to one of 
the four zones in the epithelium. Within the zone, the distribution of neurons expressing a given 
receptor is random. Chromosomal mapping studies have revealed clusters of odorant receptor genes 
at a single locus, and numerous such loci have been mapped to different chromosomes. However, 
20 receptors expressed in the same zone map to different loci, and a single locus can contain genes 
expressed in different zones. A putative odorant receptor promoter, consisting of the 6.7 kb DNA 
fragment upstream of the receptor coding region, has been shown to be sufficient to direct olfactory 
receptor expression in a tissue-specific, zonal-specific manner. 

Olfactory receptors share a seven-transmembrane domain structure (TM1 to TM7) with 
25 many neurotransmitter and hormone receptors. They show a high degree of sequence similarity in 
some conserved domains (TM2 and TM7) as well as regions of diversity (TM3, TM4, TM5, and 
TM6). They are responsible for the recognition and G protein-mediated transduction of odorant 
signals. The genes encoding these receptors are devoid of introns within their coding regions. 

Olfactory receptors display all hallmarks of the G-protein coupled receptor superfamily but 
30 have also some unique motifs. Most notably they appear to be minimal in structure with very short 
cytoplasmic and extracellular loops. In addition, they display a striking structural diversity in the 
third, fourth and fifth transmembrane domains which are supposed to form the hydrophobic core of 
these proteins, and may form the ligand binding site of the receptors. 

An understanding of the genetic basis of olfaction and a knowledge of olfactory receptors 
35 are important to enable the design of fragrance, the identification of compounds which control 
appetite, or the detection of compounds which can be harmful or dangerous. 
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SUMMARY OF THE INVENTION 

This invention provides a nucleic acid molecule encoding ten different olfactory receptor- 
like proteins (OLF). 

The invention also deals with a nucleic acid molecule comprising a nucleotide sequence 
5 encoding an olfactory receptor-like protein, which nucleotide sequence is selected from the group 
consisting of SEQ ID Nos 2-1 1 , as well as with the corresponding polypeptide encoded by this 
nucleotide sequence and with antibodies directed against the corresponding polypeptide. 

Oligonucleotide probes or primers hybridizing specifically with an olfactory receptor 
genomic sequence are also part of the present invention, as well as DNA amplification and detection 
10 methods using said primers and probes. 

The invention also concerns a purified and/or isolated biallelic marker located in the 
sequence of the olfactory receptor gene cluster of the invention, wherein said biallelic marker is 
useful as a diagnostic tool in order to detect an allele associated with a specific phenotype as regards 
to the olfaction system, including an alteration of the olfactory perception of substances or 
15 molecules. 

A further object of the invention consists of recombinant vectors comprising any of the 
nucleic acid sequences described above, and in particular of recombinant vectors comprising a 
sequence encoding an olfactory receptor protein, as well as of cell hosts and transgenic non human 
animals comprising said nucleic acid sequences or recombinant vectors. 
20 A further object of the invention consists of methods for screening substances or molecules 

interacting with an olfactory receptor encoded by any of the nucleic acid molecule described above. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 : Alignment of the amino acid sequences of the olfactory polypeptides encoded by 
25 the Open Reading Frames of the olfactory receptor gene cluster of the invention. The lower line 
represents the consensus sequence. The locations of the seven transmembrane domains TM1 to TM7 
are boxed. 

BRIEF DESCRIPTION OF THE SEQUENCES PROVIDED IN THE SEQUENCE 

LISTING 

30 SEQ ID No 1 contains the olfactory receptor genomic sequence. 

SEQ ID Nos 2-1 1 contains the nucleotide sequences of the open reading frame sequences of 
SEQ ID No 1 encoding the OLF1 to OLF10 polypeptides. 

SEQ ID No 12-21 contain the amino acid sequence of OLF1 to OLF10 polypeptides 
encoded by the open reading frames of SEQ ID Nos 2-11. 
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SEQ ID Nos 22-25 contain the amplification primers used for FISH experiments described 
in Example 1. 

SEQ ID No 26 contains a primer containing the additional PU 5' sequence described further 
in Example 3. 

5 SEQ ID No 27 contains a primer containing the additional RP 5' sequence described further 

in Example 3. 

In accordance with the regulations relating to Sequence Listings, the following codes have 
been used in the Sequence Listing to indicate the locations of biallelic markers within the sequences 
and to identify each of the alleles present at the polymorphic base. The code "r" in the sequences 

10 indicates that one allele of the polymorphic base is a guanine, while the other allele is an adenine. 
The code "y" in the sequences indicates that one allele of the polymorphic base is a thymine, while 
the other allele is a cytosine. The code "m" in the sequences indicates that one allele of the 
polymorphic base is an adenine, while the other allele is an cytosine. The code "k" in the sequences 
indicates that one allele of the polymorphic base is a guanine, while the other allele is a thymine. 

15 The code "s" in the sequences indicates that one allele of the polymorphic base is a guanine, while 
the other allele is a cytosine. The code "w" in the sequences indicates that one allele of the 
polymorphic base is an adenine, while the other allele is an thymine. 

The nucleotide code of the original allele for each biallelic marker is the following: 



Biallelic marker Original allele 

20 99-13670-305 G 

99-13669-471 G 

99-13666-275 A 

99-13664-221 T 

99-13663-218 G 

25 99-13660-277 C 

99-13652^07 G 

99-13652-357 A 

99-13652-308 A 

99-13671-396 A 

30 99-13649-286 C 

99-13648-259 G 

99-13647-278 G 



DETAILED DESCRIPTION OF THE INVENTION 

35 The aim of the present invention is to provide polynucleotides and polypeptides related to 

novel olfactory receptors, notably useful in order to design suitable means for detecting specific 
odorant molecules in a material sample, particularly in a material sample suspected to contain an 
odorant molecule that consists of one of the specific ligands for the olfactory receptors of the 
invention. 
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DEFINITIONS 

Before describing the invention in greater detail, the following definitions are set forth to 
illustrate and define the meaning and scope of the terms used to describe the invention herein. 

General definitions 

5 The terms " olfactory receptor gene " or " OLF1 to OLF10 " genes, when used herein, 

encompasses genomic, mRNA and cDNA sequences encoding the OLF1 to OLF10 olfactory 
receptor proteins. 

The term "heterologous protein" ,, when used herein, is intended to designate any protein or 
polypeptide other than the OLF1 to OLF 10 proteins. 

10 The term " isolated " requires that the material be removed from its original environment 

(e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring 
polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide 
or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, 
is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide 

15 could be part of a composition, and still be isolated in that the vector or composition is not part of its 
natural environment. 

The term " purified " does not require absolute purity; rather, it is intended as a relative 
definition. Purification of starting material or natural material to at least one order of magnitude, 
preferably two or three orders, and more preferably four or five orders of magnitude is expressly 

20 contemplated. As an example, purification from 0.1 % concentration to 10 % concentration is two 
orders of magnitude. The term "purified polynucleotide" is used herein to describe a polynucleotide 
or polynucleotide vector of the invention which has been separated from other compounds including, 
but not limited to other nucleic acids, carbohydrates, lipids and proteins (such as the enzymes used 
in the synthesis of the polynucleotide), or the separation of covalently closed polynucleotides from 

25 linear polynucleotides. A polynucleotide is substantially pure when at least about 50%, preferably 
60 to 75% of a sample exhibits a single polynucleotide sequence and conformation (linear versus 
covalently close). A substantially pure polynucleotide typically comprises about 50%, preferably 60 
to 90% weight/weight of a nucleic acid sample, more usually about 95%, and preferably is over 
about 99% pure. Polynucleotide purity or homogeneity is indicated by a number of means well 

30 known in the art, such as agarose or polyacrylamide gel electrophoresis of a sample, followed by 
visualizing a single polynucleotide band upon staining the gel. For certain purposes higher 
resolution can be provided by using HPLC or other means well known in the art. 

The term "polypeptide " refers to a polymer of amino acids without regard to the length of 
the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of 

35 polypeptide. This term also does not specify or exclude post-expression modifications of 

polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, 
acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term 
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polypeptide. Also included within the definition are polypeptides which contain one or more 
analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 
which only occur naturally in an unrelated biological system, modified amino acids from 
mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications 
5 known in the art, both naturally occurring and non-naturally occurring. 

The term " recombinant polypeptide " is used herein to refer to polypeptides that have been 
artificially designed and which comprise at least two polypeptide sequences that are not found as 
contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides 
which have been expressed from a recombinant polynucleotide. 
10 The term " purified polypeptide " is used herein to describe a polypeptide of the invention 

which has been separated from other compounds including, but not limited to nucleic acids, lipids, 
carbohydrates and other proteins. A polypeptide is substantially pure when at least about 50%, 
preferably 60 to 75% of a sample exhibits a single polypeptide sequence. A substantially pure 
polypeptide typically comprises about 50%, preferably 60 to 90% weight/weight of a protein sample, 
1 5 more usually about 95%, and preferably is over about 99% pure. Polypeptide purity or homogeneity 
is indicated by a number of means well known in the art, such as polyacrylamide gel electrophoresis 
of a sample, followed by visualizing a single polypeptide band upon staining the gel. For certain 
purposes higher resolution can be provided by using HPLC or other means well known in the art. 

As used herein, the term ' "non-human animal " refers to any non-human vertebrate, birds and 
20 more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, and 
horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is used to 
refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" expressly 
embrace human subjects unless preceded with the term "non-human". 

As used herein, the term " antibody " refers to a polypeptide or group of polypeptides which 
25 are comprised of at least one binding domain, where an antibody binding domain is formed from the 
folding of variable domains of an antibody molecule to form three-dimensional binding spaces with 
an internal surface shape and charge distribution complementary to the features of an antigenic 
determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies 
include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, 
30 Fab\ F(ab) 2 , and F(ab') 2 fragments. 

As used herein, an "antigenic determinant " is the portion of an antigen molecule, in this case 
a OLF1 to OLF10 polypeptide, that determines the specificity of the antigen-antibody reaction. An 
"epitope" refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 
amino acids in a spatial conformation which is unique to the epitope. Generally an epitope 
35 comprises at least 6 such amino acids, and more usually at least 8-1 0 such amino acids. Methods for 
determining the amino acids which make up an epitope include x-ray crystallography, 2-dimensional 
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nuclear magnetic resonance, and epitope mapping e.g. the Pepscan method described by Geysen et 
al. 1984; PCT Publication No. WO 84/03564; and PCT Publication No. WO 84/03506. 

Throughout the present specification, the expression "nucleotide sequence " may be 
employed to designate indifferently a polynucleotide or a nucleic acid. More precisely, the 
5 expression "nucleotide sequence" encompasses the nucleic material itself and is thus not restricted to 
the sequence information (i.e. the succession of letters chosen among the four base letters) that 
biochemically characterizes a specific DNA or RNA molecule. 

As used interchangeably herein, the terms Nucleic acids ", "oligonucleotides", and 
"polynucleotides" include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide 
10 in either single chain or duplex form. The term "nucleotide" as used herein as an adjective to 

describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single- 
stranded or duplex form. The term "nucleotide" is also used herein as a noun to refer to individual 
nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic 
acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a 

15 phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or 
polynucleotide. The term "nucleotide" is also used herein to encompass "modified nucleotides" 
which comprise at least one modifications (a) an alternative linking group, (b) an analogous form of 
purine, (c) an analogous form of pyrimidine, or (d) an analogous sugar, for examples of analogous 
linking groups, purine, pyrimidtnes, and sugars see for example PCT publication No. WO 95/04064. 

20 The polynucleotide sequences of the invention may be prepared by any known method, including 
synthetic, recombinant, ex vivo generation, or a combination thereof, as well as utilizing any 
purification methods known in the art. 

A promoter " refers to a DNA sequence recognized by the synthetic machinery of the cell 
required to initiate the specific transcription of a gene. 

25 A sequence which is "operablv linked " to a regulatory sequence such as a promoter means 

that said regulatory element is in the correct location and orientation in relation to the nucleic acid to 
control RNA polymerase initiation and expression of the nucleic acid of interest. As used herein, the 
term "operably linked" refers to a linkage of polynucleotide elements in a functional relationship. 
For instance, a promoter or enhancer is operably linked to a coding sequence if it affects the 

30 transcription of the coding sequence. More precisely, two DNA molecules (such as a polynucleotide 
containing a promoter region and a polynucleotide encoding a desired polypeptide or 
polynucleotide) are said to be "operably linked" if the nature of the linkage between the two 
polynucleotides does not (1) result in the introduction of a frame-shift mutation or (2) interfere with 
the ability of the polynucleotide containing the promoter to direct the transcription of the coding 

35 polynucleotide. 

The term " Vector " is used herein to designate either a circular or a linear DNA or RNA 
molecule, which is either double-stranded or single-stranded, and which comprise at least one 
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polynucleotide of interest that is sought to be transferred in a cell host or in a unicellular or 
multicellular host organism. 

The term "grimer" denotes a specific oligonucleotide sequence which is complementary to a 
target nucleotide sequence and used to hybridize to the target nucleotide sequence. A primer serves 
5 as an initiation point for nucleotide polymerization catalyzed by either DNA polymerase, RNA 
polymerase or reverse transcriptase. 

The term " probe " denotes a defined nucleic acid segment (or nucleotide analog segment, 
e.g., polynucleotide as defined hereinbelow) which can be used to identify a specific polynucleotide 
sequence present in samples, said nucleic acid segment comprising a nucleotide sequence 
10 complementary of the specific polynucleotide sequence to be identified. 

The terms "trait" and "phenotype" are used interchangeably herein and refer to any visible, 
detectable or otherwise measurable property of an organism such as symptoms of, or susceptibility 
to a disease for example. 

The term " allele " is used herein to refer to variants of a nucleotide sequence. A biallelic 
15 polymorphism has two forms. Diploid organisms may be homozygous or heterozygous for an allelic 
form. 

The term " genotype " as used herein refers the identity of the alleles present in an individual 
or a sample. In the context of the present invention, a genotype preferably refers to the description 
of the biallelic marker alleles present in an individual or a sample. The term "genotyping" a sample 

20 or an individual for a biallelic marker involves determining the specific allele or the specific 
nucleotide carried by an individual at a biallelic marker. 

The term "mutation " as used herein refers to a difference in DNA sequence between or 
among different genomes or individuals which has a frequency below 1%. 

The term " polymorphism " as used herein refers to the occurrence of two or more alternative 

25 genomic sequences or alleles between or among different genomes or individuals. "Polymorphic" 
refers to the condition in which two or more variants of a specific genomic sequence can be found in 
a population. A "polymorphic site" is the locus at which the variation occurs. A single nucleotide 
polymorphism is the replacement of one nucleotide by another nucleotide at the polymorphic site. 
Deletion of a single nucleotide or insertion of a single nucleotide also gives rise to single nucleotide 

30 polymorphisms. In the context of the present invention, "single nucleotide polymorphism" 

preferably refers to a single nucleotide substitution. Typically, between different individuals, the 
polymorphic site may be occupied by two different nucleotides. 

The term "biallelic polymorphism " and " biallelic marker " are used interchangeably herein to 
refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the 

35 population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker 
site. 
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The location of nucleotides in a polynucleotide with respect to the center of the 
polynucleotide are described herein in the following manner. When a polynucleotide has an odd 
number of nucleotides, the nucleotide at an equal distance from the 3' and 5 f ends of the 
polynucleotide is considered to be "at the center " of the polynucleotide, and any nucleotide 
5 immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is . 
considered to be "within 1 nucleotide of the center." With an odd number of nucleotides in a 
polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be 
considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even 
number of nucleotides, there would be a bond and not a nucleotide at the center of the 

10 polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 
nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide would 
be considered to be "within 2 nucleotides of the center", and so on. 

Biallelic markers can be defined as genome-derived polynucleotides having between 2 and 
100, preferably between 20, 30, or 40 and 60, and more preferably about 47 nucleotides in length, 

15 which exhibit biallelic polymorphism at one single base position. Each biallelic marker therefore 
corresponds to two forms of a polynucleotide sequence included in a gene which, when compared 
with one another, present a nucleotide modification at one position. 

The term " upstream " is used herein to refer to a location which is toward the 5' end of the 
polynucleotide from a specific reference point, 

20 The terms "base paired " and "Watson & Crick base paired" are used interchangeably herein 

to refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence 
identities in a manner like that found in double-helical DNA with thymine or uracil residues linked 
to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three 
hydrogen bonds (See Stryer, L., Biochemistry, 4 th edition, 1995). 

25 The terms "complementary " or "complement thereof are used herein to refer to the 

sequences of polynucleotides which is capable of forming Watson & Crick base pairing with another 
specified polynucleotide throughout the entirety of the complementary region. For the purpose of the 
present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide 
when each base in the first polynucleotide is paired with its complementary base. Complementary 

30 bases are, generally, A and T (or A and U), or C and G. "Complement" is used herein as a synonym 
from "complementary polynucleotide", "complementary nucleic acid" and "complementary 
nucleotide sequence". These terms are applied to pairs of polynucleotides based solely upon their 
sequences and not any particular set of conditions under which the two polynucleotides would 
actually bind. 
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I - Polynucleotides 

The invention also relates to variants and fragments of the polynucleotides described herein, 
particularly of an olfactory receptor gene containing one or more biallelic markers according to the 
5 invention. 

Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from a 
reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such as 
a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. Such 
non-naturally occurring variants of the polynucleotide may be made by mutagenesis techniques, 
1 0 including those applied to polynucleotides, cells or organisms. Generally, differences are limited so 
that the nucleotide sequences of the reference and the variant are closely similar overall and, in many 
regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to, 
nucleotide sequences at least 95% identical to a nucleic acid selected from the group consisting of 

15 SEQ ID Nos 1 -1 1 , or to any polynucleotide fragment of at least 12 consecutive nucleotides from a 
nucleic acid selected from the group consisting of SEQ ID Nos 1-11, and preferably at least 99% 
identical, more particularly at least 99.5% identical, and most preferably at least 99.8% identical to a 
nucleic acid selected from the group consisting of SEQ ID Nos 1 - 1 1 , or to any polynucleotide 
fragment of at least 12 consecutive nucleotides from a nucleic acid selected from the group 

20 consisting of SEQ ID Nos 1-11. 

Changes in the nucleotide of a variant may be silent, which means that they do not alter the 
amino acids encoded by the polynucleotide. However, nucleotide changes may also result in amino 
acid substitutions, additions, deletions, fusions and truncations in the polypeptide encoded by the 
reference sequence. The substitutions, deletions or additions may involve one or more nucleotides. 

25 The variants may be altered in coding or non-coding regions or both. Alterations in the coding 
regions may produce conservative or non-conservative amino acid substitutions, deletions or 
additions. 

In the context of the present invention, particularly preferred embodiments are those in 
which the polynucleotides encode polypeptides which retain substantially the same biological 

30 function or activity as the mature olfactory receptor protein, or those in which the polynucleotides 
encode polypeptides which maintain or increase a particular biological activity, while reducing a 
second biological activity. 

A polynucleotide fragment is a polynucleotide which sequence is fully comprised within 
part of a given nucleotide sequence, preferably the nucleotide sequence of an olfactory receptor gene 

35 of the invention, and variants thereof. The fragment can be a portion of a coding or non-coding 

region of the olfactory receptor gene cluster. Preferably, such fragments comprise at least one of the 
biallelic markers Al to Al 3 or the complements thereto or a biallelic marker in linkage 
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disequilibrium with one or more of the biallelic markers Al to A 13, for which the respective 
locations in the sequence listing are provided in Table 2. 

Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or 
they may be comprised within a single larger polynucleotide of which they form a part or region. 
5 However, several fragments may be comprised within a single larger polynucleotide. 

As representative examples of polynucleotide fragments of the invention, there may be 
mentioned those which have from about 4, 6, 8, 15, 20, 25, 40, 10 to 30, 30 to 55, 50 to 100, 75 to 
100 or 100 to 200 nucleotides in length. Preferred are those fragments having about 47 nucleotides 
in length, such as those comprising at least one of the biallelic markers Al to A13 of the olfactory 
10 receptor gene. Optionally, such fragments may consist of, or consist essentially of a contiguous span 
of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 500 or 1000 nucleotides in length. 
A set of preferred fragments contain at least one of the biallelic markers Al to AI3 of the olfactory 
receptor gene which are described herein or the complements thereto. 

2- Polypeptides 

15 The invention also relates to variants, fragments, analogs and derivatives of the polypeptides 

described herein, including mutated olfactory receptor proteins. 

The variant may be 1) one in which one or more of the amino acid residues are substituted 
with a conserved or non-conserved amino acid residue and such substituted amino acid residue may 
or may not be one encoded by the genetic code, or 2) one in which one or more of the amino acid 

20 residues includes a substituent group, or 3) one in which the mutated olfactory receptor is fused with 
another compound, such as a compound to increase the half-life of the polypeptide (for example, 
polyethylene glycol), or 4) one in which the additional amino acids are fused to the mutated 
olfactory receptor, such as a leader or secretory sequence or a sequence which is employed for 
purification of the mutated olfactory receptor or a preprotein sequence. Such variants are deemed to 

25 be within the scope of those skilled in the art. 

In the case of an amino acid substitution in the amino acid sequence of a polypeptide 
according to the invention, one or several amino acids can be replaced by "equivalent" amino acids. 
The expression "equivalent" amino acid is used herein to designate any amino acid that may be 
substituted for one of the amino acids having similar properties, such that one skilled in the art of 

30 peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to 
be substantially unchanged. Generally, the following groups of amino acids represent equivalent 
changes: (1) Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) Val, lie, Leu, 
Met, Ala, Phe; (4) Lys, Arg, His; (5) Phe, Tyr, Tip, His. 

More particularly, a variant olfactory receptor polypeptide comprises amino acid changes 

35 ranging from 1 > 2, 3, 4, 5, 1 0 to 20 substitutions, additions or deletions of one aminoacid, preferably 
from 1 to 10, more preferably from 1 to 5 and most preferably from 1 to 3 substitutions, additions or 
deletions of one amino acid. The preferred amino acid changes are those which have little or no 
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influence on the biological activity or the capacity of the variant olfactory receptor polypeptide to 
bind to antibodies raised against a native olfactory receptor protein. 

A specific, but not restrictive, embodiment of a modified peptide molecule of interest 
according to the present invention, which consists in a peptide molecule which is resistant to 
5 proteolysis, is a peptide in which the -CONH- peptide bond is modified and replaced by a (CH 2 NH) 
reduced bond, a (NHCO) retro inverso bond, a (CH 2 -0) methylene-oxy bond, a (CH 2 -S) 
thiomethylene bond, a (CH 2 CH 2 ) carba bond, a (COCH 2 ) cetomethylene bond, a (CHOH-CH 2 ) 
hydroxyethylene bond), a (N-N) bound, a E-alcene bond or also a -CH=CH- bond. 

The polypeptide according to the invention could have post-translational modifications. For 
10 example, it can present the following modifications: acylation, disulfide bond formation, 
prenylation, carboxymethylation and phosphorylation. 

A polypeptide fragment is a polypeptide which sequence is fully comprised within part of a 
given polypeptide sequence, preferably a polypeptide encoded by an olfactory receptor gene and 
variants thereof 

15 Such fragments may be "free-standing" i.e. not part of or fused to other polypeptides, or 

they may be comprised within a single larger polypeptide of which they form a part or region. 
However, several fragments may be comprised within a single larger polypeptide. 

As representative examples of polypeptide fragments of the invention, there may be 
mentioned those which have from about 5, 6, 7, 8, 9 or 10 to 15, 10 to 20, 15 to 40, or 30 to 55 

20 amino acids long. Preferred polypeptide fragments according to the invention comprise a contiguous 
span of at least 6 amino acids, preferably at least 8 or amino acids, more preferably at least 12, 15, 
20, 25, 30, 40, 50, or 100 amino acids of one amino acid sequence. Preferred are those fragments 
containing at least one amino acid mutation in the olfactory receptor protein under consideration. 

Identity between nucleic acids or polypeptides 

25 The terms "percentage of sequence identity" and "percentage homology" are used 

interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are 
determined by comparing two optimally aligned sequences over a comparison window, wherein the 
portion of the polynucleotide or polypeptide sequence in the comparison window may comprise 
additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise 

30 additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by 
determining the number of positions at which the identical nucleic acid base or amino acid residue 
occurs in both sequences to yield the number of matched positions, dividing the number of matched 
positions by the total number of positions in the window of comparison and multiplying the result by 
100 to yield the percentage of sequence identity. Homology is evaluated using either any of the 

35 variety of sequence comparison algorithms and programs known in the art, or by eye inspection. 
Such algorithms and programs include, but are by no means limited to, TBLASTN, BLASTP, 
FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 1988; Altschul et al., 1990; Thompson 
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etaL, 1994; Higgins et al., 1996; Altschul et aL, 1990; Altschu] et al., 1993). In a particularly 
preferred embodiment, protein and nucleic acid sequence homologies are evaluated using the Basic 
Local Alignment Search Tool ("BLAST') which is well known in the art (see, e.g., Karlin and 
Altschul, 1990; Altschul et al., 1990, 1993, 1997). In particular, five specific BLAST programs are 
5 used to perform the following task: 

(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein 
sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; 

10 (3) BLASTX compares the six-frame conceptual translation products of a query nucleotide 

sequence (both strands) against a protein sequence database; 

(4) TBLASTN compares a query protein sequence against a nucleotide sequence database 
translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence against 
15 the six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, 
which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid 
sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence 
database. High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring 

20 matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 
matrix (Gonnet et al., 1992; Henikoff and HenikofF, 1993). Less preferably, the PAM or PAM250 
matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978). The BLAST programs 
evaluate the statistical significance of all high-scoring segment pairs identified, and preferably 
selects those segments which satisfy a user-specified threshold of significance, such as auser- 

25 specified percent homology. Preferably, the statistical significance of a high-scoring segment pair is 
evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 1990). 
The BLAST programs may be used with the default parameters or with modified parameters 
provided by the user. 

Stringent Hybridization Conditions 

30 By way of example and not limitation, procedures using conditions of high stringency are as 

follows; Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65°C in 
buffer composed of 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 
0.02% BSA, and 500 jig/ml denatured salmon sperm DNA. Filters are hybridized for 48 h at 65°C, 
the preferred hybridization temperature, in prehybridization mixture containing 100 pg/ml denatured 

35 salmon sperm DNA and 5-20 X 1 0 6 cpm of 32 P-labeled probe. Alternatively, the hybridization step 
can be performed at 65°C in the presence of SSC buffer, 1 x SSC corresponding to 0.1 5M NaCl and 
0.05 M Na citrate. Subsequently, filter washes can be done at 37°C for 1 h in a solution containing 2 
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x SSC, 0.01 % PVP, 0.01% Ficoll, and 0.01 % BSA, followed by a wash in 0.1 X SSC at 50°C for 45 
min. Alternatively, filter washes can be performed in a solution containing 2 x SSC and 0.1% SDS, 
or 0.5 x SSC and 0. 1 % SDS, or 0. 1 x SSC and 0. 1 % SDS at 68°C for 1 5 minute intervals. 
Following the wash steps, the hybridized probes are detectable by autoradiography. Other 
5 conditions of high stringency which may be used are well known in the art and as cited in Sambrook 
et ah, 1989; and Ausubel et al., 1989. These hybridization conditions are suitable for a nucleic acid 
molecule of about 20 nucleotides in length. There is no need to say that the hybridization conditions 
described above are to be adapted according to the length of the desired nucleic acid, following 
techniques well known to the one skilled in the art. The suitable hybridization conditions may for 
10 example be adapted according to the teachings disclosed in the book of Hames and Higgins (1985) 
or in Sambrook et al.(1989). 

HOMOLOGIES OF THE NOVEL OLFACTORY RECEPTOR GENE WITH 
KNOWN OLFACTORY RECEPTORS 

A comparison analysis of various olfactory receptor amino acid sequences, including the 
1 5 novel sequences of the invention, has been performed with the alignment program Pileup and the 
translation program MAP (Winsconsin Package version 8, GCG). The protein sequences were sorted 
into different families and subfamilies, taking into account their Amino acid Sequence Identity 
(ASI). It was observed the Open Reading Frames of the OLF1 to OLF10 genes are genetically 
clearly distinguished from the already known olfactory receptor sequences. For example, the 
20 olfactory receptor OLF2 presents respectively 39.9 %, 43. 1 % and 44.2 % of identity with prior art 
olfactory receptors referred in Genbank as L35475, U58675_l and Y 10530. In addition, the 
nucleotide sequences of Orf-2 to Orf-10 according to the invention are all grouped together, whereas 
the nucleotide Orf-1 of the invention forms a new family by itself. These amino acid sequence 
comparison data clearly indicate that the novel olfactory receptor sequences of the invention share 
25 common genetic characteristics (Orf-2 to Orf-10) or have specific characteristics (Orf-1) that are not 
found in the prior art olfactory receptor sequences. 

A. OLF1 TO OLF10 GENE POLYNUCLEOTIDES. 

The cluster of ten olfactory receptor genes has been found by the inventors to be located on 
the human chromosome 11, more precisely within the 1 Iql2-ql3 locus of said chromosome as 
30 described in Example I. 

1. Genomic sequences of the olfactory receptor gene 

The present invention concerns the genomic sequence of an olfactory receptor cluster. The 
present invention encompasses the olfactory receptor gene, or olfactory receptor genomic sequences 
consisting of, consisting essentially of, or comprising the sequence of SEQ ID No 1, a sequence 
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complementary thereto, as well as fragments and variants thereof. These polynucleotides may be 
purified, isolated, or recombinant. 

The invention also encompasses a purified, isolated, or recombinant polynucleotide 
comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with 
5 a nucleotide sequence of SEQ ID No 1 or a complementary sequence thereto or a fragment thereof. 
The nucleotide differences as regards to the nucleotide sequence of SEQ ID No 1 may be generally 
randomly distributed throughout the entire nucleic acid. Nevertheless, preferred nucleic acids are 
those wherein the nucleotide differences as regards to the nucleotide sequence of SEQ ED No 1 are 
predominantly located outside the coding sequences contained in the exons. These nucleic acids, as 

1 0 well as their fragments and variants, may be used as oligonucleotide primers or probes in order to 
detect the presence of a copy of the olfactory receptor gene in a test sample, or alternatively in order 
to amplify a target nucleotide sequence within the olfactory receptor sequences. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic acid 
that hybridizes with the nucleotide sequence of SEQ ID No 1 or a complementary sequence thereto, 

15 under stringent hybridization conditions as defined above. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 

20 positions of SEQ ID No 1: 1-113643, 114064-127488, 127855-144460. Additional preferred nucleic 
acids of the invention include isolated, purified, or recombinant polynucleotides comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-10000, 

25 10001-20000, 20001-30000, 30001-40000, 40001-50000, 50001-60000, 60001-70000, 70001- 
80000, 80001-90000, 90001-100000, 100001-110000, 110001-120000, 120001-130000, 130001- 
140000, and 140001-144460. Further preferred nucleic acids of the invention include isolated, 
purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 
30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the 

30 complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 

following nucleotide positions of SEQ ID No 1: 1-5000, 5001-10000, 10001-15000, 15001-20000, 
20001-25000, 25001-30000, 30001-35000, 35001-40000, 40001-45000, 45001-50000, 50001- 
55000, 55001-60000, 60001-65000, 65001-70000, 70001-75000, 75001-80000, 80001-85000, 
85001-90000, 90001-95000, 95001-100000, 100001-105000, 105001-110000, 110001-115000, 

35 115001-120000, 120001-125000, 125001-130000, 130001-135000, 135001-140000, and 140001- 
144460. 
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The olfactory receptor genomic nucleic acid comprises 10 open reading frames, each carried 
by a single exon and encoding a polypeptide designated OLF1 to OLF10. The open reading frames 
positions of OLF1 to OLF1 0 in SEQ ID No 1 are given as features in the sequence listing and are 
also detailed below in Table A. 
5 Two truncated ubiquitin polypeptides Ubil and Ubi2, unrelated to olfactory receptor coding 

sequences, are encoded on the complementary strand of the olfactory receptor gene. The 
complementary sequence of the Ubil ORF is located between the nucleotide in position 1 14063 and 
the nucleotide in position 1 13644 of the nucleotide sequence of SEQ ID No 1 . The complementary 
sequence of the Ubi2 ORF is located between the nucleotide in position 127854 and the nucleotide 
10 in position 1 27489 of the nucleotide sequence of SEQ ID No 1 . 



Table A 



Coding regions 


Non-coding regions 


Name 


Position in SEQ ID No 1 


Name 


Position in SEQ ID No 1 




Beginning 


End 




Beginning 


End 


OLF1 


2406 


2600 


NCI 


1 


2405 


OLF2 


9711 


10658 


NC2 


2601 


9710 


OLF3 


24851 


25369 


NC3 


10659 


24850 


OLF4 


45714 


46661 


NC4 


25370 


45713 


OLF5 


80198 


81115 


NC5 


46662 


80197 


OLF6 


96291 


96902 


NC6 


81116 


96290 


OLF7 


110758 


111564 


NC7 


96903 


110757 


OLF8 


122525 


122887 


NC8 


111565 


122524 


OLF9 


132454 


133389 


NC9 


122888 


132453 


OLF10 


143398 


143577 


NC10 


133390 


143397 








NC11 


143578 


144460 



Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising 
a nucleotide sequence selected from the group consisting of the 10 open reading frames of the 
15 olfactory receptor gene, or a sequence complementary thereto. 

The nucleic acid of SEQ ID No 1 also comprises non coding portions flanking each of the 
ten olfactory receptor open reading frames of the sense DNA strand. 

The invention also embodies purified, isolated, or recombinant polynucleotides comprising a 
nucleotide sequence selected from the group consisting of the non-coding regions contained in the 
20 olfactory receptor gene cluster of SEQ ID No 1 , or a sequence complementary thereto as well as 
their fragments or variants. The term "non-coding" sequence refers to any nucleotide sequence 
which does not encode an amino acid- The non-coding sequences encompass upstream and 
downstream regions of the olfactory receptor ORFs of the invention, as well as regions located 
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between two successive olfactory receptor ORFs, as indicated in Table A which lists the 1 1 non- 
coding regions named from NC 1 to NCI 1 . 

The nucleic acids defining the non-coding sequences of the polynucleotide of SEQ ID No 1 
described above, as well as their fragments and variants, may be used as oligonucleotide primers or 
5 probes in order to detect the presence of a copy of one of the olfactory receptor genes of the 

invention in a test sample, or alternatively in order to amplify a target nucleotide sequence within the 
cluster of olfactory receptor encoding sequences according to the invention. 

While this section is entitled "Genomic Sequences of the olfactory receptor gene," it should 
be noted that nucleic acid fragments of any size and sequence may also be comprised by the 
10 polynucleotides described in this section, flanking the genomic sequences of olfactory receptor on 
either side or between two or more such genomic sequences. 

2, Coding regions of the olfactory receptor gene 

The 10 olfactory receptor open reading frames are presented individually as SEQ ID Nos 2- 
1 1 in the appended sequence listing. 

15 Thus, another object of the invention is a purified, isolated, or recombinant nucleic acid 

comprising a nucleotide sequence selected from the group consisting of SEQ ID Nos 2-11, 
complementary sequences thereto, as well as allelic variants, and fragments thereof. Moreover, 
preferred polynucleotides of the invention include purified, isolated, or recombinant olfactory 
receptor cDNAs consisting of, consisting essentially of, or comprising a sequence selected from the 

20 group consisting of SEQ ID Nos 2-1 L 

The invention also pertains to a purified or isolated nucleic acid comprising a polynucleotide 
having at least 95% nucleotide identity with a polynucleotide selected from the group consisting of 
SEQ ID Nos 2-11, advantageously 99 % nucleotide identity, preferably 99.5% nucleotide identity 
and most preferably 99.8% nucleotide identity with a polynucleotide selected from the group 

25 consisting of SEQ ID Nos 2-1 1 , or a sequence complementary thereto or a biologically active 
fragment thereof. 

Another object of the invention relates to purified, isolated or recombinant nucleic acids 
comprising a polynucleotide that hybridizes, under the stringent hybridization conditions defined 
herein, with a polynucleotide selected from the group consisting of SEQ ID Nos 2-1 1, or a sequence 

30 complementary thereto or a biologically active fragment thereof. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a sequence selected from the group 
consisting of SEQ ID Nos 2-11 or the complements thereof. Additional preferred embodiments of 

35 the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous 
span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 
nucleotides of a sequence selected from the group consisting of SEQ ID Nos 2-1 1 or the 
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complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 
following nucleotide positions of said selected sequence : 1-50*51-100, 301-150, 151-200,201-250, 
251-300, 301-350, 351-400, 401-450, 451-500, 501-550, 551-600, 601-650, 651-700, 701-750, 751- 
800, 801-850, 851-900, 901- the terminal nucleotide of the olfactory receptor coding regions, to the 
5 extent that such nucleotide positions are consistent with the lengths of the particular olfactory 
receptor coding region being referred to. Further preferred embodiments of the invention include 
isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 
18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of a sequence 
selected from the group consisting of SEQ ID Nos 2, 4, 7, 9 and 1 1, or the complements thereof, 
10 wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 

positions of said selected sequence; 1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 151-175, 176- 
200, 201-225, 226-250, 251-275, 276-300, 301-325, 326-350, 351-375, 376-400, 401-425, 426-450, 
451-475, 476-500, 501-525, 526-550, 551-575, 576-the terminal nucleotide of the olfactory receptor 
coding regions, to the extent that such nucleotide positions are consistent with the lengths of the 
1 5 particular olfactory receptor coding region being referred to. 

The present invention also embodies isolated, purified, and recombinant polynucleotides 
encoding olfactory receptor polypeptides, wherein olfactory receptor polypeptides comprise an 
amino acid sequence selected from the group consisting of SEQ ID Nos 12-21, a nucleotide 
sequence complementary thereto, a fragment or a variant thereof. The present invention also 
20 embodies isolated, purified, and recombinant polynucleotides which encode polypeptides 

comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a sequence selected from the 
group consisting of SEQ ID Nos 12-21 . In a preferred embodiment, the present invention embodies 
isolated, purified, and recombinant polynucleotides which encode polypeptides comprising a 
25 contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a sequence selected from the group consisting 
of SEQ ID Nos 12-21 wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the following 
amino acid positions in said selected sequence: 1-20, 21-40, 41-60, 61-80, 81-100, 101-120, 121- 
140, 141-160, 161-180, 181-200, 201-220, 221-240, 241-260, 261-280, 281-300, 301-the terminal 
30 ammo acid of the olfactory receptor proteins, to the extent that such amino acid positions are 
consistent with the lengths of the particular olfactory receptor protein being referred to. In another 
preferred embodiment, the present invention embodies isolated, purified, and recombinant 
polynucleotides which encode polypeptides comprising a contiguous span of at least 6 amino acids, 
preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 
35 amino acids of a sequence selected from the group consisting of SEQ ID Nos 12, 14, 1 7, 1 9 or 2 1 
wherein said contiguous span includes at least 1, 2, 3, 5 or 7 of the following amino acid positions in 
said selected sequence: 1-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, 101- 
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110, 111-120, 121-130, 131-140, 141-150, 151-160, 161-170, 171-180, 181-190, 191 -the terminal 
amino acid of the olfactory receptor proteins, to the extent that such amino acid positions are 
consistent with the lengths of the particular olfactory receptor protein being referred to. 

In further preferred embodiments, the present invention embodies isolated, purified, and 
5 recombinant polynucleotides which encode olfactory receptor polypeptides comprising a contiguous 
span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 
15, 20, 25, 30, 40, 50, or 100 amino acids of a sequence selected from the group consisting of SEQ 
ID No 12-21, wherein said contiguous span includes at least one amino acid at the following 
positions of said selected sequence: 
10 i) 1-3, 10, 16, 21, 28, 33, 34, 36, 42-44, 46, 49, 53, 54, 57, 59, 63, and 64 for SEQ ID 

No 12; 

ii) 2, 4, 6, 8, 18, 25, 34, 37, 44, 52, 56, 80, 83, 89, 98, 101, 102, 1 13, 1 14, 117, 120, 

139, 148, 158, 186, 195, 212, 219, 247, 266, 270, 280, 295, 298, 299, 301, 311, and 

313-315 for SEQ ID No 13; 
15 iii) 2-4, 6, 18, 21, 25, 34, 37, 98, 99, 102, 1 13, 1 14, 133, 143, 148, 158-163, 166, 167, 

169, and 170 for SEQ ID No 14; 
iv) 2, 4, 6, 8, 18, 25, 34, 37, 44, 52, 54, 56, 80, 83, 89, 98, 101, 102, 1 13, 1 14, 1 17, 120, 

139, 148, 158, 186, 195, 212, 219, 247, 266, 270, 280, 298, 299, 311, and 313-315 

for SEQ ID No 15; 

20 v) 3, 18, 20, 25, 34, 47, 49, 67, 97, 100, 107, 108, 112, 1 13, 126, 135, 142, 146, 147, 

157, 159-160, 194, 196, 228, 245, 264, 265, 269, 279, 298, and 302 for SEQ ID No 
16; 

vi) 2, 6, 18, 20, 33, 34, 37, 65, 68, 69, 72, 86, 88, 101, 107, 1 13, 1 14, 148, 158, 161, 
164, 195, and 198 for SEQ ID No 17; 
25 vii) 2, 6, 7, 52, 56, 67, 88, 94, 97, 1 10, 1 13, 1 16, 1 19, 120, 127, 135, 150, 153, 164, 174, 

175, 180, 184, 217, 221, 259, 261, and 268 for SEQ ID No 18; 

viii) 17, 18, 20, 28, 33, 35, 49-52, 105, 1 1 1, and 1 12 for SEQ ID No 19; 

ix) 17,20,33,35,49-53,56, 111, 112, 132, 138, 141, 147, 154, 157, 160, 163, 164, 
194, 197, 204, 21 1, 214, 218, 219, 252, 265, 286, 295, 301, 303, 305, 306 and 309 

30 for SEQ ID No 20; and 

x) 9, 18, 26-28, 34, 47 and 50 for SEQ ID No 2 1 , to the extent that such amino acid 
lengths are consistent with the lengths of the particular olfactory receptor protein 
being referred to. 

Additional preferred fragments of the nucleotide sequences of SEQ ID Nos 2-1 1 are those 
35 encoding olfactory receptor polypeptide fragments located outside the transmembrane domains of 
the corresponding protein as located in boxes in Figure L 
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The above disclosed polynucleotides that contain only coding sequences derived from the 
olfactory receptor ORFs may be expressed in a desired host cell or a desired host organism, when 
said polynucleotides are placed under the control of suitable expression signals. Such a 
polynucleotide, when placed under suitable expression signals, may be inserted in a vector for its 
5 expression. 

While this section is entitled " Coding regions of the olfactory receptor gene," it should be 
noted that nucleic acid fragments of any size and sequence may also be comprised by the 
polynucleotides described in this section, flanking the genomic sequences of olfactory receptor on 
either side or between two or more such genomic sequences. 

10 3. Polynucleotide Constructs 

The terms "polynucleotide construct" and "recombinant polynucleotide" are used 
interchangeably herein to refer to linear or circular, purified or isolated polynucleotides that have 
been artificially designed and which comprise at least two nucleotide sequences that are not found as 
contiguous nucleotide sequences in their initial natural environment. 

15 DNA Construct That Enables Directing Temporal And Spatial olfactory receptor Gene Expression 
In Recombinant Cell Hosts And In Transgenic Animals. 

In order to study the physiological and phenotypic consequences of a lack of synthesis of the 
olfactory receptor protein, both at the cell level and at the multi cellular organism level, the 
invention also encompasses DNA constructs and recombinant vectors enabling a conditional 

20 expression of a specific allele of the olfactory receptor genomic sequence or cDNA and also of a 
copy of this genomic sequence or cDNA harboring substitutions, deletions, or additions of one or 
more bases as regards to the olfactory receptor nucleotide sequence of SEQ ID Nos 1-1 i, or a 
fragment thereof, these base substitutions, deletions or additions being located in the coding regions 
of the olfactory receptor genomic sequence or within the olfactory receptor open reading frames of 

25 SEQ ID Nos 2-11. In a preferred embodiment, the olfactory receptor sequence comprises a biallelic 
marker of the present invention. In a preferred embodiment, the olfactory receptor sequence 
comprises a biallelic marker of the present invention, preferably one of the biallelic markers Al to 
A13. 

The present invention embodies recombinant vectors comprising any one of the 
30 polynucleotides described in the present invention. More particularly, the polynucleotide constructs 
according to the present invention can comprise any of the polynucleotides described in the 
"Genomic sequences of the olfactory receptor gene" section, the "Coding regions of the olfactory 
receptor Gene" section, and the "Oligonucleotide probes and primers" section. 
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DNA Constructs Allowing Homologous Recombination: Replacement Vectors 

A first preferred DNA construct will comprise, from 5 '-end to 3 '-end: (a) a first nucleotide 

sequence that is comprised in the olfactory receptor genomic sequence; (b) a nucleotide sequence 

comprising a positive selection marker, such as the marker for neomycine resistance (neo); and (c) a 
5 second nucleotide sequence that is comprised in the olfactory receptor genomic sequence, and is 

located on the genome downstream the first olfactory receptor nucleotide sequence (a). 

In a preferred embodiment, this DNA construct also comprises a negative selection marker 

located upstream the nucleotide sequence (a) or downstream the nucleotide sequence (c). 

Preferably, the negative selection marker comprises the thymidine kinase (tk) gene (Thomas et al., 
1 0 1 986), the hygromycine beta gene (Te Riele et al., 1 990), the hprt gene ( Van der Lugt et al., 1 99 1 ; 

Reid et al., 1990) or the Diphteria toxin A fragment (Dt-A) gene (Nada et al., 1993; Yagi et al.1990). 

Preferably, the positive selection marker is located within an olfactory receptor open reading frame 

sequence so as to interrupt the sequence encoding an olfactory receptor protein. These replacement 

vectors are described, for example, by Thomas et al.(1986; 1987), Mansour et al(1988) and Koller 
15 etal.(1992). 

The first and second nucleotide sequences (a) and (c) may be indifferently located within an 
olfactory receptor regulatory sequence, an intronic sequence, an exon sequence or a sequence 
containing both regulatory and/or intronic and/or exon sequences. The size of the nucleotide 
sequences (a) and (c) ranges from 1 to 50 kb, preferably from 1 to 10 kb, more preferably from 2 to 
20 6 kb and most preferably from 2 to 4 kb. 

DNA Constructs Allowing Homologous Recombination: Cre-LoxP System. 

These new DNA constructs make use of the site specific recombination system of the PI 
phage. The PI phage possesses a recombinase called Cre which interacts specifically with a 34 base 
pairs loxP site. The loxP site is composed of two palindromic sequences of 13 bp separated by a 8 

25 bp conserved sequence (Hoess et al., 1986). The recombination by the Cre enzyme between two 
/ojcP sites having an identical orientation leads to the deletion of the DNA fragment. 

The Cre-/c*P system used in combination with a homologous recombination technique has 
been first described by Gu et al.(1993, 1994). Briefly, a nucleotide sequence of interest to be 
inserted in a targeted location of the genome harbors at least two loxP sites in the same orientation 

30 and located at the respective ends of a nucleotide sequence to be excised from the recombinant 
genome. The excision event requires the presence of the recombinase (Cre) enzyme within the 
nucleus of the recombinant cell host. The recombinase enzyme may be brought at the desired time 
either by (a) incubating the recombinant cell hosts in a culture medium containing this enzyme, by 
injecting the Cre enzyme directly into the desired cell, such as described by Araki et al.(1995), or by 

35 lipofection of the enzyme into the cells, such as described by Baubonis et al.(1993); (b) transfecting 
the cell host with a vector comprising the Cre coding sequence operably linked to a promoter 
functional in the recombinant cell host, which promoter being optionally inducible, said vector being 
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introduced in the recombinant cell host, such as described by Gu et al.(1993) and Sauer et aL(I988); 
(c) introducing in the genome of the cell host a polynucleotide comprising the Cre coding sequence 
operably linked to a promoter functional in the recombinant cell host, which promoter is optionally 
inducible, and said polynucleotide being inserted in the genome of the cell host either by a random 
5 insertion event or an homologous recombination event, such as described by Gu et al.(1994). 

In a specific embodiment, the vector containing the sequence to be inserted in the olfactory 
receptor gene by homologous recombination is constructed in such a way that selectable markers are 
flanked by loxP sites of the same orientation, it is possible, by treatment by the Cre enzyme, to 
eliminate the selectable markers while leaving the olfactory receptor sequences of interest that have 
10 been inserted by an homologous recombination event. Again, two selectable markers are needed: a 
positive selection marker to select for the recombination event and a negative selection marker to 
select for the homologous recombination event. Vectors and methods using the Cre-/axP system are 
described by Zou et al.( 1 994). 

Thus, a second preferred DNA construct of the invention comprises, from 5 '-end to 3 '-end: 
15 (a) a first nucleotide sequence that is comprised in the olfactory receptor genomic sequence; (b) a 
nucleotide sequence comprising a polynucleotide encoding a positive selection marker, said 
nucleotide sequence comprising additionally two sequences defining a site recognized by a 
recombinase, such as a loxP site, the two sites being placed in the same orientation; and (c) a second 
nucleotide sequence that is comprised in the olfactory receptor genomic sequence, and is located on 
20 the genome downstream of the first olfactory receptor nucleotide sequence (a). 

The sequences defining a site recognized by a recombinase, such as a loxP site, are 
preferably located within the nucleotide sequence (b) at suitable locations bordering the nucleotide 
sequence for which the conditional excision is sought. In one specific embodiment, two loxP sites 
are located at each side of the positive selection marker sequence, in order to allow its excision at a 
25 desired time after the occurrence of the homologous recombination event. 

m a preferred embodiment of a method using the third DNA construct described above, the 
excision of the polynucleotide fragment bordered by the two sites recognized by a recombinase, 
preferably two loxP sites, is performed at a desired time, due to the presence within the genome of 
the recombinant host cell of a sequence encoding the Cre enzyme operably linked to a promoter 
30 sequence, preferably an inducible promoter, more preferably a tissue-specific promoter sequence and 
most preferably a promoter sequence which is both inducible and tissue-specific, such as described 
byGuetal.(1994). 

The presence of the Cre enzyme within the genome of the recombinant cell host may result 
from the breeding of two transgenic animals, the first transgenic animal bearing the olfactory 
35 receptor-derived sequence of interest containing the loxP sites as described above and the second 
transgenic animal bearing the Cre coding sequence operably linked to a suitable promoter sequence, 
such as described by Gu et al.(1994). 
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Spatio-temporal control of the Cre enzyme expression may also be achieved with an 
adenovirus based vector that contains the Cre gene thus allowing infection of cells, or in vivo 
infection of organs, for delivery of the Cre enzyme, such as described by Anton and Graham (1995) 
and Kanegae et al.(1995). 
5 The DNA constructs described above may be used to introduce a desired nucleotide 

sequence of the invention, preferably an olfactory receptor genomic sequence or an olfactory 
receptor coding region sequences, and most preferably an altered copy of an olfactory receptor 
genomic or coding region sequences, within a predetermined location of the targeted genome, 
leading either to the generation of an altered copy of a targeted gene (knock-out homologous 

1 0 recombination) or to the replacement of a copy of the targeted gene by another copy sufficiently 
homologous to allow an homologous recombination event to occur (knock-in homologous 
recombination). In a specific embodiment, the DNA constructs described above may be used to 
introduce an olfactory receptor genomic sequence or an olfactory receptor coding region sequence 
comprising at least one biallelic marker of the present invention, preferably at least one biallelic 

15 marker selected from the group consisting of Al to A13. 

Nuclear Antisense DNA Constructs 

Other compositions containing a vector of the invention comprising an oligonucleotide 
fragment of the nucleic sequence SEQ ID Nos 2-11, preferably a fragment including the start codon 
of the olfactory receptor gene, as an antisense tool that inhibits the expression of the corresponding 

20 olfactory receptor gene. Preferred methods using antisense polynubleotide according to the present 
invention are the procedures described by Sczakiel et al.(1995) or those described in PCT 
Application No WO 95/24223. 

Preferred antisense polynucleotides according to the present invention are complementary to 
a sequence of the mRNAs of olfactory receptor that contains the translation initiation codon ATG. 

25 Preferably, the antisense polynucleotides of the invention have a 3' polyadenylation signal 

that has been replaced with a self-cleaving ribozyme sequence, such that RNA polymerase II 
transcripts are produced without poly(A) at their 3* ends, these antisense polynucleotides being 
incapable of export from the nucleus, such as described by Liu et al.(l 994). In a preferred 
embodiment, these olfactory receptor antisense polynucleotides also comprise, within the ribozyme 

30 cassette, a histone stem-loop structure to stabilize cleaved transcripts against 3 '-5' exonucleolytic 
degradation, such as the structure described by Eckner et al.(1991). 

4* Oligonucleotide probes and primers 

Polynucleotides derived from the olfactory receptor gene are useful in order to detect the 
presence of at least a copy of a nucleotide sequence of SEQ ID Nos 1-11, or a fragment, 
35 complement, or variant thereof in a test sample, preferably a human olfactory epithelium tissue or 
isolated human olfactory epithelium cells. 
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Particularly preferred probes and primers of the invention include isolated* purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
5 positions of SEQ ID No 1: 1-113643, 114064-127488, 127855-144460. Additional preferred probes 
and primers of the invention include isolated, purified, or recombinant polynucleotides comprising a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1: 1-10000, 
10 10001-20000, 20001-30000, 30001-40000, 40001-50000, 50001-60000, 60001-70000, 70001- 
80000, 80001-90000, 90001-100000, 100001-110000, 110001-120000, 120001-130000, 130001- 
140000, and 140001-144460. Further preferred probes and primers of the invention include isolated, 
purified, or recombinant polynucleotides comprising a contiguous span of 12, 15, 18, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
1 5 thereof, wherein said contiguous span comprises at least 1 , 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-5000, 5001-10000, 10001-15000, 15001-20000,20001-25000, 25001- 
30000, 30001-35000, 35001-40000, 40001-45000, 45001-50000, 50001-55000, 55001-60000, 
60001-65000, 65001-70000, 70001-75000, 75001-80000, 80001-85000, 85001-90000, 90001- 
95000, 95001-100000, 100001-105000, 105001-110000, 110001-115000, 115001-120000, 120001- 
20 125000, 125001-130000, 130001-135000, 135001-140000, and 140001-144460. 

Other particularly preferred probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
45 or 50 nucleotides of a sequence selected from the group consisting of SEQ ID Nos 2-1 1 or the 
complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 
25 following nucleotide positions of said selected sequence : 1-50, 51-100, 101-150, 151-200, 201-250, 
251-300, 301-350, 351-400, 401-450, 451-500, 501-550, 551-600, 601-650, 651-700, 701-750, 751- 
800, 801-850, 851-900, 901- the terminal nucleotide of the olfactory receptor coding regions, to the 
extent that such nucleotide positions are consistent with the lengths of the particular olfactory 
receptor coding region being referred to. Further preferred probes and primers of the invention 
30 include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 
12, 15, 18, 20, 22 or 25 nucleotides of a sequence selected from the group consisting of SEQ ID Nos 
2, 4, 7, 9 and 1 1, or the complements thereof, wherein said contiguous span comprises at least 1, 2, 
3, 5, or 10 of the following nucleotide positions of said selected sequence: 1-25, 26-50, 51-75, 76- 
100, 101-125, 126-150, 151-175, 176-200, 201-225, 226-250, 251-275, 276-300, 301-325, 326-350, 
35 351-375, 376-400, 401-425, 426-450, 451-475, 476-500, 501-525, 526-550, 551-575, 576-the 
terminal nucleotide of the olfactory receptor coding regions, to the extent that such nucleotide 
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positions are consistent with the lengths of the particular olfactory receptor coding region being 
referred to. 

Thus, the invention also relates to nucleic acid probes characterized in that they hybridize 
specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected 
5 from the group consisting of SEQ ID Nos 1-11, a variant thereof and a sequence complementary 
thereto. 

In one embodiment the invention encompasses isolated, purified, and recombinant 
polynucleotides consisting of, or consisting essentially of a contiguous span of 8 to 50 nucleotides of 
SEQ ID No 1 and the complement thereof, wherein said span includes an olfactory receptor-related 

10 biallelic marker in said sequence; optionally, wherein said olfactory receptor-related biallelic 

marker is selected from the group consisting of A 1 to A 13, and the complements thereof; optionally, 
wherein said contiguous span is 18 to 47 nucleotides in length and said biallelic marker is within 4 
nucleotides of the center of said polynucleotide; optionally, wherein said polynucleotide consists of 
said contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is 

15 at the center of said polynucleotide; optionally, wherein the 3* end of said contiguous span is present 
at the 3* end of said polynucleotide; and optionally, wherein the 3' end of said contiguous span is 
located at the 3* end of said polynucleotide and said biallelic marker is present at the 3' end of said 
polynucleotide. In a preferred embodiment, said probes comprises, consists of, or consists 
essentially of a sequence selected from the following sequences: PI to P13 and the complementary 

20 sequences thereto, for which the respective locations in the sequence listing are provided in Table 3. 
In another embodiment the invention encompasses isolated, purified and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 50 
nucleotides of SEQ ID No 1, or the complements thereof, wherein the 3 1 end of said contiguous span 
is located at the 3' end of said polynucleotide, and wherein the 3* end of said polynucleotide is 

25 located within 20 nucleotides upstream of an olfactory receptor-related biallelic marker in said 
sequence; optionally, wherein said olfactory receptor-related biallelic marker is selected from the 
group consisting of Al to A13, and the complements thereof; optionally, wherein the 3* end of said 
polynucleotide is located 1 nucleotide upstream of said olfactory receptor-related biallelic marker in 
said sequence; and optionally, wherein said polynucleotide consists essentially of a sequence 

30 selected from the following sequences: Dl to D13 and El to E13, for which the respective locations 
in the sequence listing are provided in Table 4. 

In a further embodiment, the invention encompasses isolated, purified, or recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from the 
following sequences: Bl to Bl 1 and CI to CI 1, for which the respective locations in the sequence 

35 listing are provided in Table 1 . 

In an additional embodiment, the invention encompasses polynucleotides for use in 
hybridization assays, sequencing assays, and enzyme-based mismatch detection assays for 
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determining the identity of the nucleotide at an olfactory receptor-related biallelic marker in SEQ ID 
No 1, or the complements thereof, as well as polynucleotides for use in amplifying segments of 
nucleotides comprising an olfactory receptor-related biallelic marker in SEQ ID No 1, or the 
complements thereof; optionally, wherein said olfactory receptor-related biallelic marker is selected 
5 from the group consisting of A 1 to A 13, and the complements thereof. 

A probe or a primer according to the invention has between 8 and 1000 nucleotides in 
length, or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 
nucleotides in length. More particularly, the length of these probes and primers can range from 8, 
10, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 
10 nucleotides. Shorter probes and primers tend to lack specificity for a target nucleic acid sequence 
and generally require cooler temperatures to form sufficiently stable hybrid complexes with the 
template. Longer probes and primers are expensive to produce and can sometimes self-hybridize to 
form hairpin structures. The appropriate length for primers and probes under a particular set of 
assay conditions may be empirically determined by one of skill in the art. A preferred probe or 
15 primer consists of a nucleic acid comprising a polynucleotide selected from the group of the 

nucleotide sequences of PI to P13 and the complementary sequence thereto, Bl to Bl 1 , CI to CI 1, 
Dl toD13, and El toE13. 

Primers and other oligonucleotides according to the invention are synthesized to be 
"substantially" complementary to a strand of the olfactory receptor gene of the invention to be 
20 amplified. The primer sequence does not need to reflect the exact sequence of the DNA template. 
Minor mismatches can be accommodated by reducing the stringency of the hybridization conditions. 
Among the various methods available to design useful primers, the OSP computer software can be 
used by the skilled person (see Hillier & Green, 1991). All primers contained a common upstream 
oligonucleotide tail enabling the easy systematic sequencing of the resulting amplification 
25 fragments. 

The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. The 
Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C 
content. The higher the G+C content of the primer or probe, the higher is the melting temperature 
because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in 

30 the probes of the invention usually ranges between 10 and 75 %, preferably between 35 and 60 %, 
and more preferably between 40 and 55 %. 

The primers and probes can be prepared by any suitable method, including, for example, 
cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as 
the phosphodiester method of Narang et al.(1979), the phosphodiester method of Brown et al.(1979), 

35 the diethylphosphoramidite method of Beaucage et al.( 1981) and the solid support method described 
in EP 0 707 592. 
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Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs 
such as, for example peptide nucleic acids which are disclosed in International Patent Application 
WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 
5,034,506 and 5,142,047. The probe may have to be rendered "non-extendable" in that additional 
5 dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and 
nucleic acid probes can be rendered non-extendable by modifying the 3 1 end of the probe such that 
the hydroxyl group is no longer capable of participating in elongation. For example, the 3 r end of 
the probe can be functionalized with the capture or detection label to thereby consume or otherwise 
block the hydroxyl group. Alternatively, the 3 1 hydroxyl group simply can be cleaved, replaced or 

10 modified, U.S. Patent Application Serial No. 07/049,061 filed April 1 9, 1 993 describes 
modifications, which can be used to render a probe non-extendable. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
incorporating any label known in the art to be detectable by spectroscopic, photochemical, 
biochemical, immunochemical, or chemical means. For example, useful labels include radioactive 

1 5 substances (including, 32 P, 35 S, 3 H, 125 I), fluorescent dyes (including, 5-bromodesoxyuridin, 

fluorescein, acetylaminofluorene, digoxigenin) or biotin. Preferably, polynucleotides are labeled at 
their 3' and 5' ends. Examples of non-radioactive labeling of nucleic acid fragments are described 
in the French patent No. FR-78 1 0975 or by Urdea et al ( 1 988 ) or Sanchez-Pescador et al ( 1 988). In 
addition, the probes according to the present invention may have structural characteristics such that 

20 they allow the signal amplification, such structural characteristics being, for example, branched 
DNA probes as those described by Urdea et al. in 1991 or in the European patent No. EP 0 225 807 
(Chiron). 

A label can also be used to capture the primer, so as to facilitate the immobilization of either 
the primer or a primer extension product, such as amplified DNA, on a solid support. A capture 

25 label is attached to the primers or probes and can be a specific binding member which forms a 

binding pair with the solid's phase reagent's specific binding member (e.g. biotin and streptavidin). 
Therefore depending upon the type of label carried by a polynucleotide or a probe, it may be 
employed to capture or to detect the target DNA. Further, it will be understood that the 
polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. For 

30 example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, it 
may be selected such that it binds a complementary portion of a primer or probe to thereby 
immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself 
serves as the binding member, those skilled in the art will recognize that the probe will contain a 
sequence or 'tail" that is not complementary to the target. In the case where a polynucleotide primer 

35 itself serves as the capture label, at least a portion of the primer will be free to hybridize with a 
nucleic acid on a solid phase. DNA Labeling techniques are well known to the skilled technician. 
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The probes of the present invention are useful for a number of purposes. They can be 
notably used in Southern hybridization to genomic DNA or Northern hybridization to mRNA. The 
probes can also be used to detect PCR amplification products. They may also be used to detect 
mismatches in the OLF1 to OLF10 genes or mRNA using other techniques. Generally, the probes 
5 are complementary to the OLF1 to OLF10 gene coding sequences, although probes complementary 
to non-coding sequences are also contemplated. The probes of the present invention can also be 
useful for genotyping the biallelic markers of the cluster of olfactory receptor genes of the present 
invention. 

Any of the polynucleotides, primers and probes of the present invention can be conveniently 
10 immobilized on a solid support. Solid supports are known to those skilled in the art and include the 
walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, nitrocellulose strips, 
membranes, microparticles such as latex particles, sheep (or other animal) red blood cells, duracytes 
and others. The solid support is not critical and can be selected by one skilled in the art. Thus, latex 
particles, microparticles, magnetic or non-magnetic beads, membranes, plastic tubes, walls of 
15 microtiter wells, glass or silicon chips, sheep (or other suitable animal's) red blood cells and 

duracytes are all suitable examples. Suitable methods for immobilizing nucleic acids on solid phases 
include ionic, hydrophobic, covalent interactions and the like. A solid support, as used herein, refers 
to any material which is insoluble, or can be made insoluble by a subsequent reaction. The solid 
support can be chosen for its intrinsic ability to attract and immobilize the capture reagent. 
20 Alternatively, the solid phase can retain an additional receptor which has the ability to attract and 
immobilize the capture reagent. The additional receptor can include a charged substance that is 
oppositely charged with respect to the capture reagent itself or to a charged substance conjugated to 
the capture reagent. As yet another alternative, the receptor molecule can be any specific binding 
member which is immobilized upon (attached to) the solid support and which has the ability to 
25 immobilize the capture reagent through a specific binding reaction. The receptor molecule enables 
the indirect binding of the capture reagent to a solid support material before the performance of the 
assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized 
plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, 
bead, microparticle, chip, sheep (or other suitable animal's) red blood cells, duracytes® and other 
30 configurations known to those of ordinary skill in the art. The polynucleotides of the invention can 
be attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 
15, 20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, 
polynucleotides other than those of the invention may be attached to the same solid support as one or 
more polynucleotides of the invention. 
35 Consequently, the invention also comprises a method for detecting the presence of a nucleic 

acid comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-1 1, a 
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fragment or a variant thereof and a complementary sequence thereto in a sample, said method 
comprising the following steps of: 

a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which can 
hybridize with a nucleotide sequence selected from the group consisting of the nucleotide sequences 

5 of SEQ ID Nos 1 -1 1 , a fragment or a variant thereof and a complementary sequence thereto and the 
sample to be assayed; and 

b) detecting the hybrid complex formed between the probe and a nucleic acid in the sample. 
The invention further concerns a kit for detecting the presence of a nucleic acid comprising a 

nucleotide sequence selected from a group consisting of SEQ ID Nos 1-1 1, a fragment or a variant 
10 thereof and a complementary sequence thereto in a sample, said kit comprising: 

a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize with a 
nucleotide sequence selected from the group consisting of the nucleotide sequences of SEQ ID Nos 
1-1 1, a fragment or a variant thereof and a complementary sequence thereto; and 

b) optionally, the reagents necessary for performing the hybridization reaction. 

15 In a first preferred embodiment of this detection method and kit, said nucleic acid probe or 

the plurality of nucleic acid probes are labeled with a detectable molecule. In a second preferred 
embodiment of said method and kit, said nucleic acid probe or the plurality of nucleic acid probes 
has been immobilized on a substrate. In a third preferred embodiment, the nucleic acid probe or the 
plurality of nucleic acid probes comprise either a sequence which is selected from the group 

20 consisting of the nucleotide sequences of PI to PI 3 and the complementary sequence thereto, Bl to 
Bl 1, CI to CI 1, Dl to D13, El to E13 or a biallelic marker selected from the group consisting of Al 
to A 13 and the complements thereto. 

Oligonucleotide arrays 

A substrate comprising a plurality of oligonucleotide primers or probes of the invention may 
25 be used either for detecting or amplifying targeted sequences in the olfactory receptor gene and may 
also be used for detecting mutations in the coding or in the non-coding sequences of the olfactory 
receptor gene. 

Any polynucleotide provided herein may be attached in overlapping areas or at random 
locations on the solid support. Alternatively the polynucleotides of the invention may be attached in 

30 an ordered array wherein each polynucleotide is attached to a distinct region of the solid support 
which does not overlap with the attachment site of any other polynucleotide. Preferably, such an 
ordered array of polynucleotides is designed to be "addressable" where the distinct locations are 
recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays 
typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a 

35 substrate in different known locations. The knowledge of the precise location of each 

polynucleotides location makes these "addressable" arrays particularly useful in hybridization 
assays. Any addressable array technology known in the art can be employed with the 
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polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is 
known as the Genechips™, and has been generally described in US Patent 5,143,854; PCT 
publications WO 90/15070 and 92/10092. These arrays may generally be produced using 
mechanical synthesis methods or light directed synthesis methods which incorporate a combination 
5 of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991). The 
immobilization of arrays of oligonucleotides on solid supports has been rendered possible by the 
development of a technology generally identified as "Very Large Scale Immobilized Polymer 
Synthesis" (VLSIPS™) in which, typically, probes are immobilized in a high density array on a 
solid surface of a chip. Examples of VLSIPS™ technologies are provided in US Patents 5,143,854; 
10 and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95/1 1995, which 
describe methods for forming oligonucleotide arrays through techniques such as light-directed 
synthesis techniques. In designing strategies aimed at providing arrays of nucleotides immobilized 
on solid supports, further presentation strategies were developed to order and display the 
oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence 
15 information. Examples of such presentation strategies are disclosed in PCT Publications WO 
94/12305, WO 94/1 1530, WO 97/29212 and WO 97/31256. 

In another embodiment of the oligonucleotide arrays of the invention, an oligonucleotide 
probe matrix may advantageously be used to detect mutations occurring in the olfactory receptor 
gene. For this particular purpose, probes are specifically designed to have a nucleotide sequence 
20 allowing their hybridization to the genes that carry known mutations (either by deletion, insertion or 
substitution of one or several nucleotides). By known mutations, it is meant, mutations on the 
olfactory receptor gene that have been identified according to, for example, the technique used by 
Huang et al.(1996) or Samson et al.(1996). 

Another technique that is used to detect mutations in the olfactory receptor gene is the use of 
25 a high-density DNA array. Each oligonucleotide probe constituting a unit element of the high 
density DNA array is designed to match a specific subsequence of the olfactpry receptor genomic 
DNA or cDNA. Thus, an array consisting of oligonucleotides complementary to subsequences of 
the target gene sequence is used to determine the identity of the target sequence with the wild gene 
sequence, measure its amount, and detect differences between the target sequence and the reference 
30 wild gene sequence of the olfactory receptor gene. In one such design, termed 4L tiled array, is 
implemented a set of four probes (A, C, G, T), preferably 15-nucleotide oligomers. In each set of 
four probes, the perfect complement will hybridize more strongly than mismatched probes. 
Consequently, a nucleic acid target of length L is scanned for mutations with a tiled array containing 
4L probes, the whole probe set containing all the possible mutations in the known wild reference 
35 sequence. The hybridization signals of the 15-mer probe set tiled array are perturbed by a single 
. base change in the target sequence. As a consequence, there is a characteristic loss of signal or a 
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"footprint" for the probes flanking a mutation position. This technique was described by Chee et al. 
in 1996. 

Consequently, the invention concerns an array of nucleic acid molecules comprising at least 
one polynucleotide described above as probes and primers. Preferably, the invention concerns an 
5 array of nucleic acid comprising at least two polynucleotides described above as probes and primers. 
A further object of the invention consists of an array of nucleic acid sequences comprising 
either at least one of the sequences selected from the group consisting of PI to P13, Bl to Bl 1, CI to 
CI 1, Dl to D13, El to E13, the sequences complementary thereto, a fragment thereof of at least 8, 
10 > 12, 15, 18, 20, 25, 30, or 40 consecutive nucleotides thereof, and at least one sequence 
10 comprising a biallelic marker selected from the group consisting of Al to A13 and the complements 
thereto. 

The invention also pertains to an array of nucleic acid sequences comprising either at least 
two of the sequences selected from the group consisting of PI to P13, Bl to Bl 1, CI to CI 1, Dl to 
D13, El to E13, the sequences complementary thereto, a fragment thereof of at least 8 consecutive 
15 nucleotides thereof, and at least two sequences comprising a biallelic marker selected from the group 
consisting of A 1 to A 13 and the complements thereof. 

B. OLF1 TO OFL10 PROTEINS AND POLYPEPTIDE FRAGMENTS 

The proteins encoded by the Open Reading Frames of the OLF1 to OLF10 genes are listed 
individually in the sequence listing as SEQ ID Nos 12-21. 

20 The term "olfactory receptor polypeptides" is used herein to embrace ail of the proteins and 

polypeptides of the present invention. Also forming part of the invention are polypeptides encoded 
by the polynucleotides of the invention, as well as fusion polypeptides comprising such 
polypeptides. The invention embodies olfactory receptor proteins from humans, including isolated 
or purified olfactory receptor proteins consisting of, consisting essentially of, or comprising the 

25 sequences of SEQ ID Nos 12-21 or naturally-occurring variants or fragments thereof. 

The present invention embodies isolated, purified, and recombinant polypeptides comprising 
a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, more preferably 
at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos 12-21 . In a preferred 
embodiment, the present invention embodies isolated, purified, and recombinant polypeptides 

30 comprising a contiguous span of at least 6 amino acids, preferably at least 8 or 10 amino acids, more 
preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID Nos 12-21 wherein said 
contiguous span includes at least 1, 2, 3, 5 or 10 of the following amino acid positions in SEQ ID 
Nos 12-21: 1-20,21-40,41-60,61-80, 81-100, 101-120, 121-140, 241-160, 161-180, 181-200,201- 
220, 221-240, 241-260, 261-280, 281-300, 301 -the terminal amino acid of the olfactory receptor 

35 proteins, to the extent that such amino acid positions are consistent with the lengths of the particular 
olfactory receptor protein being referred to. In another preferred embodiment, the present invention 
embodies isolated, purified, and recombinant polypeptides comprising a contiguous span of at least 6 
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amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 
50, or 100 amino acids of a sequence selected from the group consisting of SEQ ID Nos 12, 14, 17, 
19 and 21 wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the following amino acid 
positions of said selected sequence: 1-10, 11-20, 21-30, 31-40,41-50, 51-60, 61-70, 71-80, 81-90, 
5 91-100, 101-110, 111-120, 121-130, 131-140, 141-150, 151-160, 161-170, 171-180, 181-190, 191- 
the terminal amino acid of the olfactory receptor proteins, to the extent that such amino acid 
positions are consistent with the lengths of the particular olfactory receptor protein being referred to. 
In further preferred embodiments, the present invention embodies isolated, purified, and 
recombinant polypeptides comprising a contiguous span of at least 6 amino acids, preferably at least 
10 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of a 
sequence selected from the group consisting of SEQ ID Nos 12-21, wherein said contiguous span 
includes at least one amino acid at the following positions of said selected sequence 

i) 1-3, 10, 36, 21, 28, 33, 34, 36, 42-44, 46, 49, 53, 54, 57, 59, 63, and 64 for SEQ ID 
No 12; 

15 ii) 2, 4, 6, 8, 18, 25, 34, 37, 44, 52, 56, 80, 83, 89, 98, 101, 102, 113, 114, 117, 120, 

139, 148, 158, 186, 195, 212, 219, 247, 266, 270, 280, 295, 298, 299, 301, 31 1, and 

313-315 for SEQ ID No 13; 
iii) 2-4,6, 18,21,25,34,37,98,99, 102, 113, 114, 133, 143, 148, 158-163, 166, 167, 

1 69, and 1 70 for SEQ ID No 14; 
20 iv) 2, 4, 6, 8, 18, 25, 34, 37, 44, 52, 54, 56, 80, 83, 89, 98, 101, 102, 1 13, 1 14, 117, 120, 

139, 148, 158, 186, 195,212,219,247,266, 270,280,298, 299, 31 1, and 313-315 

for SEQ ID No 15; 

v) 3, 18, 20, 25, 34, 47, 49, 67, 97, 100, 107, 108, 1 12, 1 13, 126, 135, 142, 146, 147, 
157, 159-160, 194, 196, 228, 245, 264, 265, 269, 279, 298, and 302 for SEQ ID No 

25 16; 

vi) 2, 6, 18, 20, 33, 34, 37, 65, 68, 69, 72, 86, 88, 101, 107, 1 13, 1 14, 148, 158, 161, 
164, 195, and 198 for SEQ ID No 17; 

vii) 2, 6, 7, 52, 56, 67, 88, 94, 97, 1 10, 1 13, 116, 1 19, 120, 127, 135, 150, 153, 164, 174, 
175, 180, 184, 217, 221, 259, 261, and 268 for SEQ ID No 18; 

30 viii) 1 7, 1 8, 20, 28, 33, 35, 49-52, 1 05, 1 1 1 , and 1 1 2 for SEQ ID No 1 9; 

ix) 17,20,33,35,49-53,56, 111, 112, 132, 138, 141, 147, 154, 157, 160, 163, 164, 
194, 197, 204, 21 1, 214, 218, 219, 252, 265, 286, 295, 301, 303, 305, 306 and 309 
for SEQ ID No 20; and 

x) 9, 1 8, 26-28, 34, 47 and 50 for SEQ ID No 2 1 , to the extent that such amino acid 
35 lengths are consistent with the lengths of the particular olfactory receptor protein 

being referred to. 



BNSDOCID:<WO 0021985A2 J > 



WO 00/21985 PCT/IB99/01729 

33 

Other preferred OLF1 to OLF10 polypeptide fragments are those located outside the 
transmembrane domains, most preferably peptide fragments naturally exposed on the eel! 
membrane, particularly those that are available for binding to ligand molecules, either odorant 
substances or molecules or antibodies directed to the olfactory receptor polypeptides of the 
5 invention. Such transmembrane domains TM1 to TM7 are boxed in Figure 1 . In other preferred 
embodiments the contiguous stretch of amino acids comprises the site of a mutation or functional 
mutation, including a deletion, addition, swap or truncation of the amino acids in the olfactory 
receptor protein sequence. 

The invention also encompasses a purified, isolated, or recombinant polypeptides 
10 comprising an amino acid sequence having at least 70, 75, 80, 85, 90, 95, 98 or 99% amino acid 
identity with the amino acid sequence of SEQ ID Nos 12-21 or a fragment thereof. 

The invention also encompasses an olfactory receptor polypeptide or a fragment or a variant 
thereof in which at least one peptide bound has been modified as defined in the "Definitions" 
section. 

15 A further object of the invention concerns a purified or isolated polypeptide which is 

encoded by a nucleic acid comprising a nucleotide sequence selected from the group consisting of 
SEQ ID Nos 1 -1 1 or fragment or variants thereof. 

Such mutated olfactory receptor proteins may be the target of diagnostic tools, such as 
specific monoclonal or polyclonal antibodies, useful for the detecting the mutated olfactory receptor 
20 proteins in a sample. 

Olfactory receptor proteins are preferably isolated from human or mammalian tissue samples 
or expressed from human or mammalian genes. 

The olfactory receptor polypeptides of the invention is extracted from cells or tissues of 
humans or non-human animals. Methods for purifying proteins are known in the art, and include the 
25 use of detergents or chaotropic agents to disrupt particles followed by differential extraction and 
separation of the polypeptides by ion exchange chromatography, affinity chromatography, 
sedimentation according to density, and gel electrophoresis. 

In addition, shorter protein fragments may also be prepared by the conventional methods of 
chemical synthesis, either in a homogenous solution or in solid phase. As an illustrative embodiment 
30 of such chemical polypeptide synthesis techniques, it may be cited the homogenous solution 

technique described by Houbenweyl in 1 974. For solid phase synthesis the technique described by 
Merrifield (1965) may be used in particular. 

Alternatively, the proteins of the invention can be made using routine expression methods 
known in the art as described below and in the section "Expression of a OLF1 to OLF10 coding 
35 polynucleotide *\ Briefly, the polynucleotide encoding the desired polypeptide, is ligated into an 
expression vector suitable for any convenient host. Both eukaryotic and prokaryotic host systems is 
used in forming recombinant polypeptides. The polypeptide is then isolated from lysed cells or from 
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the culture medium and purified to the extent needed for its intended use. Purification is by any 
technique known in the art, for example, differential extraction* salt fractionation, chromatography, 
centrifugation, and the like. See, for example, Methods in Enzymology for a variety of methods for 
purifying proteins. 

5 Any olfactory receptor cDNA, including SEQ ID Nos 12-21, may be used to express olfactory 

receptor proteins and polypeptides. The nucleic acid encoding the olfactory receptor protein or 
polypeptide to be expressed is operably linked to a promoter in an expression vector using conventional 
cloning technology. The olfactory receptor insert in the expression vector may comprise the fall coding 
sequence for the olfactory receptor protein or a portion thereof. For example, the olfactory receptor 
10 derived insert may encode a polypeptide comprising at least 10 consecutive amino acids of the olfactory 
receptor protein of SEQ ID Nos 12-21, including any of the polypeptide fragment defined in this 
section. 

The expression vector is any of the mammalian, yeast, insect or bacterial expression systems 
known in the art. Commercially available vectors and expression systems are available from a variety 

1 5 of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega 
(Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and 
facilitate proper protein folding, the codon context and codon pairing of the sequence is optimized for 
the particular expression organism in which the expression vector is introduced, as explained by 
Hatfield, et al., U.S. Patent No. 5,082,767. 

20 In one embodiment, the entire coding sequence of the olfactory receptor cDNA through the 

poly A signal of the cDNA are operably linked to a promoter in the expression vector. Alternatively, if 
the nucleic acid encoding a portion of the olfactory receptor protein lacks a methionine to serve as the 
initiation site, an initiating methionine can be introduced next to the first codon of the nucleic acid using 
conventional techniques. Similarly, if the insert from the olfactory receptor cDNA lacks a poly A 

25 signal, this sequence can be added to the construct by, for example, splicing out the Poly A signal from 
pSG5 (Stratagene) using Bgll and Sail restriction endonuclease enzymes and incorporating it into the 
rnarnrnalian expression vector pXTl (Stratagene). 

The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life 
Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. 

30 Positive transfectants are selected after growing the transfected cells in 600ug/ml G4 18 (Sigma, St. 
Louis, Missouri). 

The above procedures may also be used to express a mutant olfactory receptor protein 
responsible for a detectable phenotype or a portion thereof. 

Purification of the recombinant protein or peptide according to the present invention may be 
35 realized by passage onto a Nickel or Copper affinity chromatography column. The Nickel 

chromatography column may contain the Ni-NTA resin (Porath et aL, 1975). The polypeptides or 
peptides thus obtained may be purified, for example by high performance liquid chromatography, 
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such as reverse phase and/or cationic exchange HPLC, as described by Rougeot et al. (1994). The 
reason to prefer this kind of peptide or protein purification is the lack of side products found in the 
elution samples which renders the resultant purified protein or peptide more suitable for a 
therapeutic use. 

5 The expressed protein may also be purified using other conventional purification techniques 

such as ammonium sulfate precipitation or chromatographic separation based on size or charge. The 
protein encoded by the nucleic acid insert may also be purified using standard immunochromatography 
techniques. In such procedures, polyclonal or monoclonal antibodies capable of specifically binding to 
the expressed olfactory receptor protein sof SEQ ID Nos 12-21 , or a fragment or a variant thereof, have 

10 been previously immobilized onto a chromatography matrix. Such antibodies are described in the 

section "Antibodies that bind olfactory receptor polypeptides" below. Then, a solution containing the 
expressed olfactory receptor protein or portion thereof, such as a cell extract, is applied to the 
chromatography column in conditions allowing the expressed protein to bind to the antibodies in the 
immunochromatography column. Thereafter, the column is washed to remove non-specifically bound 

1 5 proteins. The specifically bound expressed protein is then released from the column and recovered 
using standard techniques. 

If antibody production is not possible, the nucleic acids encoding the olfactory receptor protein 
or a portion thereof is incorporated into expression vectors designed for use in purification schemes 
employing chimeric polypeptides. In such strategies the nucleic acid encoding the olfactory receptor 

20 protein or a portion thereof is inserted in frame with the gene encoding the other half of the chimera. 
The other half of the chimera is P-globin or a nickel binding polypeptide encoding sequence. A 
chromatography matrix having antibody to P-globin or nickel attached thereto is then used to purify the 
chimeric protein. Protease cleavage sites is engineered between the P-globin gene or the nickel binding 
polypeptide and the olfactory receptor protein or portion thereof. Thus, the two polypeptides of the 

25 chimera is separated from one another by protease digestion. 

One useful expression vector for generating P-globin chimeric proteins is pSG5 (Stratagene), 
which encodes rabbit P-globin. Intron II of the rabbit P-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of 
expression. These techniques are well known to those skilled in the art of molecular biology. Standard 

30 methods are published in methods texts such as Davis et al., ( 1 986) and many of the methods are 
available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be 
produced from the construct using in vitro translation systems such as the In vitro Express™ Translation 
Kit (Stratagene). 

To confirm expression of the olfactory receptor protein or a portion thereof, the proteins 
35 expressed from host cells containing an expression vector containing an insert encoding die olfactory 
receptor protein or a portion thereof can be compared to the proteins expressed in host cells containing 
the expression vector without an insert. The presence of a band in samples from cells containing the 
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expression vector with an insert which is absent in samples from cells containing the expression vector 
without an insert indicates that the olfactory receptor protein or a portion thereof is being expressed. 
Generally, the band will have the mobility expected for the olfactory receptor protein or portion thereof 
However, the band may have a mobility different than that expected as a result of modifications such as 
5 glycosylation, ubiquitination, or enzymatic cleavage. 

Other suitable techniques for producing and purifying the olfactory receptor proteins of the 
invention or their fragments or variants are also described under the heading "Methods for 
scrreening substances or molecules interacting with an olfactory receptor protein". 

Thus, the present invention also concerns a method for the producing a polypeptide of the 
10 invention, and especially a polypeptide selected from the group of SEQ ID Nos 12-2 1 or a fragment 
or a variant thereof, wherein said methods comprises the steps of : 

a) culturing, in an appropriate culture medium, a cell host previously transformed or 
transfected with the recombinant vector comprising a nucleic acid encoding an olfactory receptor 
polypeptide of the invention, or a fragment or a variant thereof; 
15 b) harvesting the culture medium thus conditioned or lyze the cell host, for example by 

sonication or by an osmotic shock; 

c) separating or purifying, from the said culture medium, or from the pellet of the resultant 
host cell lysate the thus produced polypeptide of interest. 

d) optionally characterizing the produced polypeptide of interest. 

20 In a specific embodiment of the above method, step a) is preceded by a step wherein the 

nucleic acid coding for an olfactory receptor polypeptide, or a fragment or a variant thereof, is 
inserted in an appropriate vector, optionally after an appropriate cleavage of this amplified nucleic 
acid with one or several restriction endonucleases. The nucleic acid coding for an olfactory receptor 
polypeptide or a fragment or a variant thereof may be the resulting product of an amplification 

25 reaction using a pair of primers according to the invention (by PCR, SDA, TAS, 3SR NASBA, TMA 
etc.)* 

C. ANTIBODIES THAT BIND OLFACTORY RECEPTOR POLYPEPTIDES 

Any olfactory receptor polypeptide or whole protein may be used to generate antibodies 
capable of specifically binding to an expressed olfactory receptor protein or fragments thereof as 
30 described. 

One antibody composition of the invention is capable of specifically binding or specifically 
bind to the variant of the olfactory receptor protein of SEQ ID Nos 12-21 . For an antibody 
composition to specifically bind to a first variant of olfactory receptor protein, it must demonstrate at 
least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for a first variant of the 
35 olfactory receptor protein than for a second variant of the olfactory receptor protein in an ELISA, 
RIA, or other antibody-based binding assay. 
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In a preferred embodiment, the invention concerns antibody compositions, either polyclonal 
or monoclonal, capable of selectively binding, or that selectively bind to an epitope-containing a 
polypeptide comprising any of the fragments described in the section u OLFl to OLF10 proteins and 
polypeptide fragments". Preferred peptide fragments are portions of OLF1 to OLF10 polypeptides 
5 that are located outside the transmembrane domains, most preferably peptide fragments naturally 
exposed on the cell membrane, particularly those that are available for binding to ligand molecules, 
either odorant substances or molecules or antibodies directed to the olfactory receptor polypeptides 
of the invention. 

The invention also concerns a purified or isolated antibody capable of specifically binding to 

10 a mutated olfactory receptor protein or to a fragment or variant thereof comprising an epitope of the 
mutated olfactory receptor protein. In another preferred embodiment, the present invention concerns 
an antibody capable of binding to a polypeptide comprising at least 10 consecutive amino acids of an 
olfactory receptor protein. 

In a preferred embodiment, the invention concerns the use in the manufacture of antibodies 

15 of a polypeptide comprising any of the fragments described in the section "OLF1 to OLF10 proteins 
and polypeptide fragments". Preferred peptide fragments are portions of OLF1 to OLF10 
polypeptides that are located outside the transmembrane domains, most preferably peptide fragments 
naturally exposed on the cell membrane, particularly those that are available for recognition of 
ligand molecules, either odorant substances or molecules or antibodies directed to the olfactory 

20 receptor polypeptides of the invention. 

The olfactory receptor expressed from a DNA comprising at least one of the nucleic 
sequences of SEQ ID Nos 1-1 1 or a fragment or a variant thereof may also be used to generate 
antibodies capable of specifically binding to the expressed olfactory receptor or fragments or 
variants thereof. In a preferred embodiment, any of the polynucleotide fragment encoding a 

25 polypeptide described in the section ** Coding regions of the olfactory receptor gene" may be used to 
generate such antibodies. 

Substantially pure protein or polypeptide is isolated from transfected or transformed cells 
containing an expression vector encoding the olfactory receptor protein or a portion thereof. The 
concentration of protein in the final preparation is adjusted, for example, by concentration on an 

30 Amicon filter device, to the level of a few micrograms/ml. Monoclonal or polyclonal antibodies to the 
protein can then be prepared as follows: 

1. Monoclonal Antibody Production by Hybridoma Fusion 

Monoclonal antibody to epitopes in the olfactory receptor of the present invention or a portion 
thereof can be prepared from murine hybridomas according to the classical method of Kohler and 
35 Milstein, (1975) or derivative methods thereof. Briefly, a mouse is repetitively inoculated with a few 
micrograms of the considered olfactory receptor or a portion thereof over a period of a few weeks. The 
mouse is then sacrificed, and the antibody producing cells of the spleen isolated. The spleen cells are 
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fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells 
destroyed by growth of the system on selective media comprising aminopterin (HAT media). The 
successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate 
where growth of the culture is continued. Antibody-producing clones are identified by detection of 
5 antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELBA, as originally 
described by Engvall, (1980), and derivative methods thereof Selected positive clones can be expanded 
and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody 
production are described in Davis, L. et al. 

2. Polyclonal Antibody Production by Immunization 

1 0 Polyclonal antiserum containing antibodies to heterogeneous epitopes in the olfactory receptor 

of the present invention or a portion thereof can be prepared by immunizing suitable animals with the 
considered olfactory receptor or a portion thereof, which can be unmodified or modified to enhance 
immunogenicity. A suitable non-human animal, preferably a non-human mammal, is selected, 
usually a mouse, rat, rabbit, goat, or horse. Alternatively, a crude preparation which has been 
15 enriched for olfactory receptor concentration can be used to generate antibodies. Such proteins, 
fragments or preparations are introduced into the non-human mammal in the presence of an 
appropriate adjuvant (e.g. aluminum hydroxide, RIBI, etc.) which is known in the art. In addition 
the protein, fragment or preparation can be pretreated with an agent which will increase antigenicity, 
such agents are known in the art and include, for example, methylated bovine serum albumin 
20 (inBSA), bovine serum albumin (BSA), Hepatitis B surface antigen, and keyhole limpet hemocyanin 
(KLH). Serum from the immunized animal is collected, treated and tested according to known 
procedures. If the serum contains polyclonal antibodies to undesired epitopes, the polyclonal 
antibodies can be purified by immunoaffinity chromatography. 

Effective polyclonal antibody production is affected by many factors related both to the antigen 
25 and the host species. Also, host animals vary in response to site of inoculations and dose, with both 
inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen 
administered at multiple intradermal sites appears to be most reliable. Techniques for producing and 
processing polyclonal antisera are known in the art, see for example, Mayer and Walker (1987). An 
effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. (1971). 
30 Booster injections can be given at regular intervals, and antiserum harvested when antibody titer 

thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against 
known concentrations of the antigen, begins to fall. See, for example, Ouchterlony et al., (1973). 
Plateau concentration of antibody is usually in the range of 0. 1 to 0.2 mg/ml of serum. Affinity of the 
antisera for the antigen is determined by preparing competitive binding curves, as described, for 
35 example, by Fisher, (1980). 

Antibody preparations prepared according to either the monoclonal or the polyclonal protocol 
are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances 
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in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of 
antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing 
cells expressing the protein or reducing the levels of the protein in the body. 

Non-human animals or mammals, whether wild-type or transgenic which express a different 
5 species of olfactory receptor than the one to which antibody binding is desired, and animals which 
do not express olfactory receptor (i.e. an olfactory receptor knock out animal as described herein) are 
particularly useful for preparing antibodies. Olfactory receptor knock out animals will recognize all 
or most of the exposed regions of an olfactory receptor protein as foreign antigens, and therefore 
produce antibodies with a wider array of olfactory receptor epitopes. Moreover, smaller 

10 polypeptides with only 10 to 30 amino acids may be useful in obtaining specific binding to any one 
of the olfactory receptor proteins. In addition, the humoral immune system of animals which 
produce a species of olfactory receptor that resembles the antigenic sequence will preferentially 
recognize the differences between the animal's native olfactory receptor species and the antigen 
sequence, and produce antibodies to these unique sites in the antigen sequence. Such a technique 

15 will be particularly useful in obtaining antibodies that specifically bind to any one of the olfactory 
receptor proteins. 

The present invention also includes, chimeric single chain Fv antibody fragments (Martineau et 
al., 1 998), antibody fragments obtained through phage displ ay libraries (Ridder et al., 1995; Vaughan et 
al, 1995) and humanized antibodies (Reinmann et aL, 1997: Leger et al., 1997). 
20 The antibodies of the invention may be labeled by any one of the radioactive, fluorescent or 

enzymatic labels known in the art. 

Consequently, the invention is also directed to a method for detecting specifically the 
presence of a polypeptide according to the invention in a biological sample, said method comprising 
the following steps : 

25 a) bringing into contact the biological sample with an antibody according to the 

invention; 

b) detecting the antigen-antibody complex formed. 
Is also part of the invention a diagnostic kit for in vitro detecting the presence of a 
polypeptide according to the present invention in a biological sample, wherein said kit comprises: 
30 a) a polyclonal or monoclonal antibody as described above, optionally labeled; 

b) a reagent allowing the detection of the antigen-antibody complexes formed, said 
reagent carrying optionally a label, or being able to be recognized itself by a labeled reagent, 
more particularly in the case when the above-mentioned monoclonal or polyclonal antibody 
is not labeled by itself. 

35 OLFACTORY RECEPTOR-RELATED BIALLELIC MARKERS 

The invention also concerns olfactory receptor-related biallelic markers. As used herein the 
term "olfactory receptor-related biallelic marker" relates to a set of biallelic markers in linkage 
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disequilibrium with the olfactory receptor gene. The term olfactory receptor-related biallelic marker 
includes the biallelic markers designated Al to A13. 

The biallelic markers of the present invention, namely A I to A 13, are disclosed in Table 2 of 
Example 4. The 13 olfactory receptor-related biallelic markers, Al to A13 ? are all located in the 
5 genomic non coding regions of the olfactory gene cluster of the invention. Their precise location on 
the olfactory receptor genomic sequence and their single base polymorphism are indicated in Table 2 
and also as features in the sequence listing for SEQ ID No 1 . Appropriate pairs of primers allowing 
the amplification of a nucleic acid containing the polymorphic base of the disclosed olfactory 
receptor biallelic marker are also listed in Table 1 of Example 3 and in features of SEQ ID No 1 . 
10 In the present invention, the biallelic markers can be defined by nucleotide sequences 

corresponding to oligonucleotides of 47 bases in length comprising at the middle one of the 
polymorphic base. More particularly, the biallelic markers can be defined by the polynucleotides PI 
toP13. 

The biallelic markers contained in the olfactory gene cluster of the present invention, or a 
15 busset of such biallelic markers, are useful tools to perform association studies, preferably to 
perform association studies between the statistically significant occurrence of an allele of said 
biallelic marker in the genome of an individual and a specific phenotype, including a phenotype 
consisting of an alteration of the olfactory perception of odorant substances or molecules by said 
individual. The biallelic markers of the invention can also be used, for example, in linkage analysis 
20 in which evidence is sought for cosegregation between a locus and a putative trait locus using family 
studies, such as an alteration of olfactory perception. In addition, the biallellic markers of the 
invention may be included inthe generation of any complete or partial genetic map of the human 
genome. These different uses are specifically contemplated in the present invention and claims. 

1. Identification of biallelic markers 

25 Any of a variety of methods can be used to screen a genomic fragment for single nucleotide 

polymorphisms such as differential hybridization with oligonucleotide probes, detection of changes 
in the mobility measured by gel electrophoresis or direct sequencing of the amplified nucleic acid. 
A preferred method for identifying biallelic markers involves comparative sequencing of genomic 
DNA fragments from an appropriate number of unrelated individuals. 

30 In a first embodiment, DNA samples from unrelated individuals are pooled together, 

following which the genomic DNA of interest is amplified and sequenced. The nucleotide 
sequences thus obtained are then analyzed to identify significant polymorphisms. One of the major 
advantages of this method resides in the fact that the pooling of the DNA samples substantially 
reduces the number of DNA amplification reactions and sequencing reactions, which must be carried 

35 out. Moreover, this method is sufficiently sensitive so that a biallelic marker obtained thereby 
usually shows a sufficient degree of informativeness to be useful in conducting association studies. 
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In a second embodiment, the DNA samples are not pooled and are therefore amplified and 
sequenced individually. This method is usually preferred when biallelic markers need to be 
identified in order to perform association studies within candidate genes. Preferably, highly relevant 
gene regions such as promoter regions or exon regions may be screened for biallelic markers. A 
5 biallelic marker obtained using this method may show a lower degree of informativeness for 

conducting association studies, e.g. if the frequency of its less frequent allele may be less than about 
10%. Such a biallelic marker will, however, be sufficiently informative to conduct association 
studies and it will further be appreciated that including less informative biallelic markers in the 
genetic analysis studies of the present invention, may allow in some cases the direct identification of 
10 causal mutations, which may, depending on their penetrance, be rare mutations. 

The following is a description of the various parameters of a preferred method used by the 
inventors for the identification of the biallelic markers of the present invention. 

Genomic DNA Samples 

The genomic DNA samples from which the biallelic markers of the present invention are 

15 generated are preferably obtained from unrelated individuals corresponding to a heterogeneous 
population of known ethnic background. The number of individuals from whom DNA samples are 
obtained can vary substantially, preferably from about 10 to about 1000, preferably from about 50 to 
about 200 individuals. It is usually preferred to collect DNA samples from at least about 100 
individuals in order to have sufficient polymorphic diversity in a given population to identify as 

20 many markers as possible and to generate statistically significant results. 

As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples, which can 
be tested by the methods of the present invention described herein, and include human and animal 
body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, and 

25 various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, milk, 
white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; fixed 
tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 
aspirates and fixed cell specimens. The preferred source of genomic DNA used in the present 
invention is from peripheral venous blood of each donor. Techniques to prepare genomic DNA 

30 from biological samples are well known to the skilled technician. Details of a preferred embodiment 
are provided in Example 2. The person skilled in the art can choose to amplify pooled or unpooled 
DNA samples. 

DNA Amplification 

The identification of biallelic markers in a sample of genomic DNA may be facilitated 
35 through the use of DNA amplification methods. DNA samples can be pooled or unpooled for the 
amplification step. DNA amplification techniques are well known to those skilled in the art. 
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Amplification techniques that can be used in the context of the present invention include, but 
are not limited to, the ligase chain reaction (LCR) described in EP-A- 320 308, WO 9320227 and 
EP-A-439 1 82, the polymerase chain reaction (PCR, RT-PCR) and techniques such as the nucleic 
acid sequence based amplification (NASBA) described in Guatelli J.C M et al.(1990) and in Compton 
5 J.(1991), Q-beta amplification as described in European Patent Application No 454461 0, strand 
displacement amplification as described in Walker et al.(l 996) and EP A 684 315 and, target 
mediated amplification as described in PCT Publication WO 9322461 . 

LCR and Gap LCR are exponential amplification techniques, both depend on DNA ligase to 
join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), probe pairs 
1 0 are used which include two primary (first and second) and two secondary (third and fourth) probes, 
all of which are employed in molar excess to target. The first probe hybridizes to a first segment of 
the target strand and the second probe hybridizes to a second segment of the target strand, the first 
and second segments being contiguous so that the primary probes abut one another in 5 ? phosphate- 
3'hydroxyl relationship, and so that a ligase can covalently fuse or ligate the two probes into a fused 
15 product. In addition, a third (secondary) probe can hybridize to a portion of the first probe and a 
fourth (secondary) probe can hybridize to a portion of the second probe in a similar abutting fashion. 
Of course, if the target is initially double stranded, the secondary probes also will hybridize to the 
target complement in the first instance. Once the ligated strand of primary probes is separated from 
the target strand, it will hybridize with the third and fourth probes, which can be ligated to form a 
20 complementary, secondary ligated product. It is important to realize that the ligated products are 
functionally equivalent to either the target or its complement. By repeated cycles of hybridization 
and ligation, amplification of the target sequence is achieved. A method for multiplex LCR has also 
been described (WO 9320227). Gap LCR (GLCR) is a version of LCR where the probes are not 
adjacent but are separated by 2 to 3 bases. 
25 For amplification of mRNAs, it is within the scope of the present invention to reverse 

transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single 
enzyme for both steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric Gap LCR 
(RT-AGLCR) as described by Marshall et al.(1994). AGLCR is a modification of GLCR that 
allows the amplification of KNA. 
30 The PCR technology is the preferred amplification technique used in the present invention. 

A variety of PCR techniques are familiar to those skilled in the art. For a review of PCR technology, 
see White (1997) and the publication entitled "PCR Methods and Applications" (1991, Cold Spring 
Harbor Laboratory Press). In each of these PCR procedures, PCR primers on either side of the 
nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid sample along 
35 with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, or Vent 
polymerase. The nucleic acid in the sample is denatured and the PCR primers are specifically 
hybridized to complementary nucleic acid sequences in the sample. The hybridized primers are 
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extended. Thereafter, another cycle of denaturation. hybridization, and extension is initiated. The 
cycles are repeated multiple times to produce an amplified fragment containing the nucleic acid 
sequence between the primer sites. PCR has further been described in several patents including US 
Patents 4,683,195; 4,683,202; and 4,965,188. 
5 The PCR technology is the preferred amplification technique used to identify new biallelic 

markers. A typical example of a PCR reaction suitable for the purposes of the present invention is 
provided in Example 3. 

One of the aspects of the present invention is a method for the amplification of the human 
olfactory receptor gene, particularly of a fragment of the genomic sequence of SEQ ID No 1 or of 
10 the coding region sequences of SEQ ID Nos 2-1 1, or a fragment or a variant thereof in a test sample, 
preferably using the PCR technology. This method comprises the steps of: 

a) contacting a test sample with amplification reaction reagents comprising a pair of 

amplification primers as described above and located on either side of the polynucleotide 
region to be amplified, and 
15 b) optionally, detecting the amplification products. 

The invention also concerns a kit for the amplification of an olfactory receptor gene sequence, 
particularly of a portion of the genomic sequence of SEQ ID No 1 or of the coding region sequences 
of SEQ ID Nos 2-1 1, or a variant thereof in a test sample, wherein said kit comprises: 

a) a pair of oligonucleotide primers located on either side of the olfactory receptor region to 
20 be amplified; 

b) optionally, the reagents necessary for performing the amplification reaction. 

In one embodiment of the above amplification method and kit. the amplification product is 
detected by hybridization with a labeled probe having a sequence which is complementary to the 
amplified region. In another embodiment of the above amplification method and kit, primers 

25 comprise a sequence which is selected from the group consisting of the nucleotide sequences of Bl 
to Bl 1, CI to CI 1, Dl to D13, and El to E13. 

In a first embodiment of the present invention, biallelic markers are identified using genomic 
sequence information generated by the inventors. Sequenced genomic DNA fragments are used to 
design primers for the amplification of 500 bp fragments. These 500 bp fragments are amplified 

30 from genomic DNA and are scanned for biallelic markers. Primers may be designed using the OSP 
software (Hillier L. and Green P., 1991). All primers may contain, upstream of the specific target 
bases, a common oligonucleotide tail that serves as a sequencing primer. Those skilled in the art are 
familiar with primer extensions, which can be used for these purposes. 

Sequencing Of Amplified Genomic DNA And Identification Of Single Nucleotide Polymorphisms 
35 The amplification products generated as described above, are then sequenced using any 

method known and available to the skilled technician. Methods for sequencing DNA using either 
the dideoxy-mediated method (Sanger method) or the Maxam-Gilbert method are widely known to 
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those of ordinary skill in the art. Such methods are for example disclosed in Sambrook et al.(1989). 
Alternative approaches include hybridization to high-density DNA probe arrays as described in Chee 
et al,(1996). 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
5 reactions using a dye-primer cycle sequencing protocol. Following gel image analysis and DNA 
sequence extraction, sequence data are automatically processed with adequate software to assess 
sequence quality. 

A polymorphism analysis software is used that detects the presence of biallelic sites among 
individual or pooled amplified fragment sequences. Polymorphism search is based on the presence 
10 of superimposed peaks in the electrophoresis pattern. These peaks which present distinct colors 
correspond to two different nucleotides at the same position on the sequence. The polymorphism has 
to be detected on both strands for validation. 

Validation Of The Biallelic Markers Of The Present Invention 

The polymorphisms are evaluated for their usefulness as genetic markers by validating that 
15 both alleles are present in a population. Validation of the biallelic markers is accomplished by 
genotyping a group of individuals by a method of the invention and demonstrating that both alleles 
are present. Microsequencing is a preferred method of genotyping alleles. The validation by 
genotyping step may be performed on individual samples derived from each individual in the group 
or by genotyping a pooled sample derived from more than one individual The group can be as 
20 small as one individual if that individual is heterozygous for the allele in question. Preferably the 
group contains at least three individuals, more preferably the group contains five or six individuals, 
so that a single validation test will be more likely to result in the validation of more of the biallelic 
markers that are being tested. It should be noted, however, that when the validation test is 
performed on a small group it may result in a false negative result if as a result of sampling error 
25 none of the individuals tested carries one of the two alleles. Thus, the validation process is less 
useful in demonstrating that a particular initial result is an artifact, than it is at demonstrating that 
there is a bona fide biallelic marker at a particular position in a sequence. All of the genotyping, 
haplotyping, association, and interaction study methods of the invention may optionally be 
performed solely with validated biallelic markers. 

30 2. Genotyping of biallelic markers 

The polymorphisms identified above can be further confirmed and their respective 
frequencies can be determined through various methods using the previously described primers and 
probes. These methods can also be useful for genotyping either new populations in association 
studies or individuals in the context of detection of alleles of biallelic markers which are known to 
35 be associated with a given trait. Those skilled in the art should note that the methods described 
below can be equally performed on individual or pooled DNA samples. 
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Once a given polymorphic site has been found and characterized as a biallelic marker as 
described above, several methods can be used in order to determine the specific allele carried by an 
individual at the given polymorphic base. 

The identification of biallelic markers described previously allows the design of appropriate 
5 primers to amplify a region of the olfactory receptor gene cluster containing the polymorphic site of 
interest and for the detection of such polymorphisms. 

Genotyping can be performed using similar methods as those described above for the 
identification of the biallelic markers, or using other genotyping methods such as those further 
described below. In preferred embodiments, the comparison of sequences of amplified genomic 
10 fragments from different individuals is used to identify new biallelic markers whereas 

microsequencing is used for genotyping known biallelic markers in diagnostic and genetic analysis 
applications. 

In one embodiment the invention encompasses methods of genotyping comprising 
determining the identity of a nucleotide at an olfactory receptor-related biallelic marker or the 

15 complement thereof in a biological sample; optionally, wherein said olfactory receptor-related 
biallelic marker is selected from the group consisting of Al to A13, and the complements thereof; 
optionally, wherein said biological sample is derived from a single subject; optionally, wherein the 
identity of the nucleotides at said biallelic marker is determined for both copies of said biallelic 
marker present in said individual's genome; optionally, wherein said biological sample is derived 

20 from multiple subjects; Optionally, the genotyping methods of the invention encompass methods 
with any further limitation described in this disclosure, or those following, specified alone or in any 
combination; Optionally, said method is performed in vitro; optionally, further comprising 
amplifying a portion of said sequence comprising the biallelic marker prior to said determining step; 
Optionally, wherein said amplifying is performed by PCR, LCR, or replication of a recombinant 

25 vector comprising an origin of replication and said fragment in a host cell; optionally, wherein said 
determining is performed by a hybridization assay, a sequencing assay, a microsequencing assay, or 
an enzyme-based mismatch detection assay. 

Source of Nucleic Acids for genotyping 

Any source of nucleic acids, in purified or non-purified form, can be utilized as the starting 
30 nucleic acid, provided it contains or is suspected of containing the specific nucleic acid sequence 
desired; DNA or KNA may be extracted from cells, tissues, body fluids and the like as described 
above. While nucleic acids for use in the genotyping methods of the invention can be derived from 
any mammalian source, the test subjects and individuals from which nucleic acid samples are taken 
are generally understood to be human. 
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Amplification Of DNA Fragments Comprising Biallelic Markers 

Methods and polynucleotides are provided to amplify a segment of nucleotides comprising 
one or more biallelic marker of the present invention. It will be appreciated that amplification of 
DNA fragments comprising biallelic markers may be used in various methods and for various 
5 purposes and is not restricted to genotyping. Nevertheless, many genotyping methods, although not 
all, require the previous amplification of the DNA region carrying the biallelic marker of interest 
Such methods specifically increase the concentration or total number of sequences that span the 
biallelic marker or include that site and sequences located either distal or proximal to it. Diagnostic 
assays may also rely on amplification of DNA segments carrying a biallelic marker of the present 
10 invention. Amplification of DNA may be achieved by any method known in the art. Amplification 
techniques are described above in the section entitled, "DNA amplification." 

Some of these amplification methods are particularly suited for the detection of single 
nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and the 
identification of the polymorphic nucleotide as it is further described below. 
15 The identification of biallelic markers as described above allows the design of appropriate 

oligonucleotides, which can be used as primers to amplify DNA fragments comprising the biallelic 
markers of the present invention. Amplification can be performed using the primers initially used to 
discover new biallelic markers which are described herein or any set of primers allowing the 
amplification of a DNA fragment comprising a biallelic marker of the present invention. 
20 In some embodiments the present invention provides primers for amplifying a DNA 

fragment containing one or more biallelic markers of the present invention. Preferred amplification 
primers are listed in Example 3. It will be appreciated that the primers listed are merely exemplary 
and that any other set of primers which produce amplification products containing one or more 
biallelic markers of the present invention are also of use. 
25 The spacing of the primers determines the length of the segment to be amplified. In the 

context of the present invention, amplified segments carrying biallelic markers can range in si2e 
from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical, 
fragments from 50-1000 bp are preferred and fragments from 100-600 bp are highly preferred. It 
will be appreciated that amplification primers for the biallelic markers may be any sequence which 
30 allow the specific amplification of any DNA fragment carrying the markers. Amplification primers 
may be labeled or immobilized on a solid support as described in "Oligonucleotide probes and 
primers". 

Methods of Genotyping DNA samples for Biallelic Markers 

Any method known in the art can be used to identify the nucleotide present at a biallelic 
35 marker site. Since the biallelic marker allele to be detected has been identified and specified in the 
present invention, detection will prove simple for one of ordinary skill in the art by employing any 
of a number of techniques. Many genotyping methods require the previous amplification of the 
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DNA region carrying the biallelic marker of interest. While the amplification of target or signal is 
often preferred at present, ultrasensitive detection methods which do not require amplification are 
also encompassed by the present genotyping methods. Methods well-known to those skilled in the 
art that can be used to detect biallelic polymorphisms include methods such as, conventional dot blot 
5 analyzes, single strand conformational polymorphism analysis (SSCP) described by Orita et 
al.(1989), denaturing gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch 
cleavage detection, and other conventional techniques as described in Sheffield et al(1991), White 
et al.(1992), Grompe et al.(1989 and 1993). Another method for determining the identity of the 
nucleotide present at a particular polymorphic site employs a specialized exonuclease-resistant 

10 nucleotide derivative as described in US patent 4,656,127. 

Preferred methods involve directly determining the identity of the nucleotide present at a 
biallelic marker site by sequencing assay, enzyme-based mismatch detection assay, or hybridization 
assay. The following is a description of some preferred methods. A highly preferred method is the 
microsequencing technique. The term "sequencing" is generally used herein to refer to polymerase 

15 extension of duplex primer/template complexes and includes both traditional sequencing and 
microsequencing. 
1) Sequencing Assays 

The nucleotide present at a polymorphic site can be determined by sequencing methods. In 
a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as 

20 described above. DNA sequencing methods are described in "Sequencing Of Amplified Genomic 
DNA And Identification Of Single Nucleotide Polymorphisms". 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. Sequence analysis allows the identification 
of the base present at the biallelic marker site. 

25 2) Microsequencing Assays 

In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is 
detected by a single nucleotide primer extension reaction. This method involves appropriate 
microsequencing primers which, hybridize just upstream of the polymorphic base of interest in the 
target nucleic acid. A polymerase is used to specifically extend the 3' end of the primer with one 

30 single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the 
identity of the incorporated nucleotide is determined in any suitable way. 

Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the 
extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing 
machines to determine the identity of the incorporated nucleotide as described in EP 412 883. 

35 Alternatively capillary electrophoresis can be used in order to process a higher number of assays 
simultaneously. An example of a typical microsequencing procedure that can be used in the context 
of the present invention is provided in Example 5. 
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Different approaches can be used for the labeling and detection of ddNTPs. A homogeneous 
phase detection method based on fluorescence resonance energy transfer has been described by Chen 
and Kwok (1997) and Chen et al.(1997). In this method, amplified genomic DNA fragments 
containing polymorphic sites are incubated with a 5'-fluorescein-labeled primer in the presence of 
5 allelic dye-labeled dideoxyribonucleoside triphosphates and a modified Taq polymerase. The dye- 
labeled primer is extended one base by the dye-terminator specific for the allele present on the 
template. At the end of the genotyping reaction, the fluorescence intensities of the two dyes in the 
reaction mixture are analyzed directly without separation or purification. All these steps can be 
performed in the same tube and the fluorescence changes can be monitored in real time. 
10 Alternatively, the extended primer may be analyzed by MALDI-TOF Mass Spectrometry. The base 
at the polymorphic site is identified by the mass added onto the microsequencing primer (see Haff 
and Smimov, 1997). 

Microsequencing may be achieved by the established microsequencing method or by 
developments or derivatives thereof. Alternative methods include several solid-phase 

15 microsequencing techniques. The basic microsequencing protocol is the same as described 

previously, except that the method is conducted as a heterogeneous phase assay, in which the primer 
or the target molecule is immobilized or captured onto a solid support. To simplify the primer 
separation and the terminal nucleotide addition analysis, oligonucleotides are attached to solid 
supports or are modified in such ways that permit affinity separation as well as polymerase 

20 extension. The 5' ends and internal nucleotides of synthetic oligonucleotides can be modified in a 
number of different ways to permit different affinity separation approaches, e.g., biotinylation. If a 
single affinity group is used on the oligonucleotides, the oligonucleotides can be separated from the 
incorporated terminator regent. This eliminates the need of physical or size separation. More than 
one oligonucleotide can be separated from the terminator reagent and analyzed simultaneously if 

25 more than one affinity group is used. This permits the analysis of several nucleic acid species or 
more nucleic acid sequence information per extension reaction. The affinity group need not be on 
the priming oligonucleotide but could alternatively be present on the template. For example, 
immobilization can be carried out via an interaction between biotinylated DNA and streptavidin- 
coated microtitration wells or avidin-coated polystyrene particles. In the same manner, 

30 oligonucleotides or templates may be attached to a solid support in a high-density format. In such 
solid phase microsequencing reactions, incorporated ddNTPs can be radiolabeled (Syvanen, 3 994) 
or linked to fluorescein (Livak and Hainer, 1994). The detection of radiolabeled ddNTPs can be 
achieved through scintillation-based techniques. The detection of fluorescein-linked ddNTPs can be 
based on the binding of antifluorescein antibody conjugated with alkaline phosphatase, followed by 

35 incubation with a chromogenic substrate (such as p-nitrophenyl phosphate). Other possible reporter- 
detection pairs include: ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase 
conjugate (Harju et al., 1993) or biotinylated ddNTP and horseradish peroxidase-conjugated 
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streptavidin with o-phenylenediamine as a substrate (WO 92/15712). As yet another alternative 
solid-phase microsequencing procedure, Nyren et al.(1993) described a method relying on the 
detection of DNA polymerase activity by an enzymatic luminometric inorganic pyrophosphate 
detection assay (ELIDA). 
5 Pastinen et al.(1997) describe a method for multiplex detection of single nucleotide 

polymorphism in which the solid phase minisequencing principle is applied to an oligonucleotide 
array format. High-density arrays of DNA probes attached to a solid support (DNA chips) are 
further described below. 

In one aspect the present invention provides polynucleotides and methods to genotype one or 

10 more biallelic markers of the present invention by performing a microsequencing assay. Preferred 
microsequencing primers include the nucleotide sequences Dl to Dn and El to En. It will be 
appreciated that the microsequencing primers listed in Example 5 are merely exemplary and that, 
any primer having a 3' end immediately adjacent to the polymorphic nucleotide may be used. 
Similarly, it will be appreciated that microsequencing analysis may be performed for any biallelic 

15 marker or any combination of biallelic markers of the present invention. One aspect of the present 
invention is a solid support which includes one or more microsequencing primers listed in Example 
5, or fragments comprising at least 8, 12, 15, 20, 25, 30, 40, or 50 consecutive nucleotides thereof, to 
the extent that such lengths are consistent with the primer described, and having a 3' terminus 
immediately upstream of the corresponding biallelic marker, for determining the identity of a 

20 nucleotide at a biallelic marker site. 

3) Mismatch detection assays based on polymerases and ligases 

In one aspect the present invention provides polynucleotides and methods to determine the 
allele of one or more biallelic markers of the present invention in a biological sample, by mismatch 
detection assays based on polymerases and/or ligases. These assays are based on the specificity of 

25 polymerases and ligases. Polymerization reactions places particularly stringent requirements on 
correct base pairing of the 3' end of the amplification primer and the joining of two oligonucleotides 
hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, 
especially at the 3' end. Methods, primers and various parameters to amplify DNA fragments 
comprising biallelic markers of the present invention are further described above in "Amplification 

30 Of DNA Fragments Comprising Biallelic Markers". 

Allele Specific Amplification Primers 
Discrimination between the two alleles of a biallelic marker can also be achieved by allele 
specific amplification, a selective strategy, whereby one of the alleles is amplified without 
amplification of the other allele. For allele specific amplification, at least one member of the pair of 

35 primers is sufficiently complementary with a region of an olfactory receptor gene comprising the 
polymorphic base of a biallelic marker of the present invention to hybridize therewith and to initiate 
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the amplification. Such primers are able to discriminate between the two alleles of a biallelic 
marker. 

This is accomplished by placing the polymorphic base at the 3* end of one of the 
amplification primers. Because the extension forms from the 3 'end of the primer, a mismatch at or 
5 near this position has an inhibitory effect on amplification. Therefore, under appropriate 
amplification conditions, these primers only direct amplification on their complementary allele. 
Determining the precise location of the mismatch and the corresponding assay conditions are well 
within the ordinary skill in the art. 

Ligation/Amplification Based Methods 
10 The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are designed 

to be capable of hybridizing to abutting sequences of a single strand of a target molecules. One of 
the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise 
complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that 
their termini abut, and create a ligation substrate that can be captured and detected. OLA is capable 
15 of detecting single nucleotide polymorphisms and may be advantageously combined with PCR as 
described by Nickerson et al.(1990). In this method, PCR is used to achieve the exponential 
amplification of target DNA, which is then detected using OLA. 

Other amplification methods which are particularly suited for the detection of single 
nucleotide polymorphism include LCR (ligase chain reaction), Gap LCR (GLCR) which are 
20 described above in "DNA Amplification". LCR uses two pairs of probes to exponentially amplify a 
specific target. The sequences of each pair of oligonucleotides, is selected to permit the pair to 
hybridize to abutting sequences of the same strand of the target. Such hybridization forms a 
substrate for a template-dependant ligase. In accordance with the present invention, LCR can be 
performed with oligonucleotides having the proximal and distal sequences of the same strand of a 
25 biallelic marker site. In one embodiment, either oligonucleotide will be designed to include the 
biallelic marker site. In such an embodiment, the reaction conditions are selected such that the 
oligonucleotides can be ligated together only if the target molecule either contains or lacks the 
specific nucleotide that is complementary to the biallelic marker on the oligonucleotide. In an 
alternative embodiment, the oligonucleotides will not include the biallelic marker, such that when 
30 they hybridize to the target molecule, a "gap" is created as described in WO 90/01 069. This gap is 
then "filled" with complementary dNTPs (as mediated by DNA polymerase), or by an additional 
pair of oligonucleotides. Thus at the end of each cycle, each single strand has a complement capable 
of serving as a target during the next cycle and exponential allele-specific amplification of the 
desired sequence is obtained. 
35 Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method for determining the 

identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This method 
involves the incorporation of a nucleoside triphosphate that is complementary to the nucleotide 
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present at the preselected site onto the terminus of a primer molecule, and their subsequent ligation 
to a second oligonucleotide. The reaction is monitored by detecting a specific label attached to the 
reaction's solid phase or by detection in solution. 
4) Hybridization Assay Methods 
5 A preferred method of determining the identity of the nucleotide present at a biallelic marker 

site involves nucleic acid hybridization. The hybridization probes, which can be conveniently used 
in such reactions, preferably include the probes defined herein. Any hybridization assay may be 
used including Southern hybridization, Northern hybridization, dot blot hybridization and solid- 
phase hybridization (see Sambrook et ah, 1989). 

10 Hybridization refers to the formation of a duplex structure by two single stranded nucleic 

acids due to complementary base pairing. Hybridization can occur between exactly complementary 
nucleic acid strands or between nucleic acid strands that contain minor regions of mismatch. 
Specific probes can be designed that hybridize to one form of a biallelic marker and not to the other 
and therefore are able to discriminate between different allelic forms. Allele-specific probes are 

15 often used in pairs, one member of a pair showing perfect match to a target sequence containing the 
original allele and the other showing a perfect match to the target sequence containing the alternative 
allele. Hybridization conditions should be sufficiently stringent that there is a significant difference 
in hybridization intensity between alleles, and preferably an essentially binary response, whereby a 
probe hybridizes to only one of the alleles. Stringent, sequence specific hybridization conditions, 

20 under which a probe will hybridize only to the exactly complementary target sequence are well 
known in the art (Sambrook et al., 1989). Stringent conditions are sequence dependent and will be 
different in different circumstances. Generally, stringent conditions are selected to be about 5°C 
lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and 
pH. Although such hybridization can be performed in solution, it is preferred to employ a solid- 

25 phase hybridization assay. The target DNA comprising a biallelic marker of the present invention 
may be amplified prior to the hybridization reaction. The presence of a specific allele in the sample 
is determined by detecting the presence or the absence of stable hybrid duplexes formed between the 
probe and the target DNA. The detection of hybrid duplexes can be carried out by a number of 
methods. Various detection assay formats are well known which utilize detectable labels bound to 

30 either the target or the probe to enable detection of the hybrid duplexes. Typically, hybridization 
duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then 
detected. Those skilled in the art will recognize that wash steps may be employed to wash away 
excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay 
formats are suitable for detecting the hybrids using the labels present on the primers and probes. 

35 Two recently developed assays allow hybridization-based allele discrimination with no need 

for separations or washes (see Landegren U. et al., 1998). The TaqMan assay takes advantage of 
the 5' nuclease activity of Taq DNA polymerase to digest a DNA probe annealed specifically to the 
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accumulating amplification product. TaqMan probes are labeled with a donor-acceptor dye pair that 
interacts via fluorescence energy transfer. Cleavage of the TaqMan probe by the advancing 
polymerase during amplification dissociates the donor dye from the quenching acceptor dye, greatly 
increasing the donor fluorescence. All reagents necessary to detect two allelic variants can be 
5 assembled at the beginning of the reaction and the results are monitored in real time (see Livak et ah, 
1995). In an alternative homogeneous hybridization based procedure, molecular beacons are used 
for allele discriminations. Molecular beacons are hairpin-shaped oligonucleotide probes that report 
the presence of specific nucleic acids in homogeneous solutions. When they bind to their targets 
they undergo a conformational reorganization that restores the fluorescence of an internally 
10 quenched fluorophore (Tyagi et al., 1998). 

The polynucleotides provided herein can be used to produce probes which can be used in 
hybridization assays for the detection of biallelic marker alleles in biological samples. These probes 
are characterized in that they preferably comprise between 8 and 50 nucleotides, and in that they are 
sufficiently complementary to a sequence comprising a biallelic marker of the present invention to 
15 hybridize thereto and preferably sufficiently specific to be able to discriminate the targeted sequence 
for only one nucleotide variation. A particularly preferred probe is 25 nucleotides in length. 
Preferably the biallelic marker is within 4 nucleotides of the center of the polynucleotide probe. In 
particularly preferred probes, the biallelic marker is at the center of said polynucleotide. Preferred 
probes comprise a nucleotide sequence selected from the group consisting of amplicons listed in 
20 Table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. Preferred probes comprise a nucleotide 
sequence selected from the group consisting of PI to P13 and the sequences complementary thereto. 
In preferred embodiments the polymorphic base(s) are within 5, 4, 3, 2, 1 , nucleotides of the center 
25 of the said polynucleotide, more preferably at the center of said polynucleotide. 

Preferably the probes of the present invention are labeled or immobilized on a solid support. 
Labels and solid supports are further described in "Oligonucleotide Probes and Primers". The 
probes can be non-extendable as described in "Oligonucleotide Probes and Primers". 

By assaying the hybridization to an allele specific probe, one can detect the presence or 
30 absence of a biallelic marker allele in a given sample. High-Throughput parallel hybridization in 
array format is specifically encompassed within "hybridization assays" and are described below. 
5) Hybridization To Addressable Arrays Of Oligonucleotides 

Hybridization assays based on oligonucleotide arrays rely on the differences in hybridization 
stability of short oligonucleotides to perfectly matched and mismatched target sequence variants. 
35 Efficient access to polymorphism information is obtained through a basic structure comprising high- 
density arrays of oligonucleotide probes attached to a solid support (e.g., the chip) at selected 
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positions. Each DNA chip can contain thousands to millions of individual synthetic DNA probes 
arranged in a grid-like pattern and miniaturized to the size of a dime. 

The chip technology has already been applied with success in numerous cases. For example, 
the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae mutant strains, 
5 and in the protease gene of HIV-1 virus (Hacia et al., 1 996; Shoemaker et al., 1 996; Kozal et al., 
1996). Chips of various formats for use in detecting biallelic polymorphisms can be produced on a 
customized basis by Affymetrix (GeneChip™), Hyseq (HyChip and HyGnostics), and Protogene 
Laboratories. 

In general, these methods employ arrays of oligonucleotide probes that are complementary 

10 to target nucleic acid sequence segments from an individual which, target sequences include a 
polymorphic marker. EP 785280 describes a tiling strategy for the detection of single nucleotide 
polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific 
polymorphisms. By "tiling" is generally meant the synthesis of a defined set of oligonucleotide 
probes which is made up of a sequence complementary to the target sequence of interest, as well as 

15 preselected variations of that sequence, e.g., substitution of one or more given positions with one or 
more members of the basis set of nucleotides. Tiling strategies are further described in PCT 
application No. WO 95/1 1995. In a particular aspect, arrays are tiled for a number of specific, 
identified biallelic marker sequences. In particular, the array is tiled to include a number of 
detection blocks, each detection block being specific for a specific biallelic marker or a set of 

20 biallelic markers. For example, a detection block may be tiled to include a number of probes, which 
span the sequence segment that includes a specific polymorphism. To ensure probes that are 
complementary to each allele, the probes are synthesized in pairs differing at the biallelic marker. In 
addition to the probes differing at the polymorphic base, monosubstituted probes are also generally 
tiled within the detection block. These monosubstituted probes have bases at and up to a certain 

25 number of bases in either direction from the polymorphism, substituted with the remaining 

nucleotides (selected from A, T, G, C and U). Typically the probes in a tiled detection block will 
include substitutions of the sequence positions up to and including those that are 5 bases away from 
the biallelic marker. The monosubstituted probes provide internal controls for the tiled array, to 
distinguish actual hybridization from artefactual cross-hybridization. Upon completion of 

30 hybridization with the target sequence and washing of the array, the array is scanned to determine 
the position on the array to which the target sequence hybridizes. The hybridization data from the 
scanned array is then analyzed to identify which allele or alleles of the biallelic marker are present in 
the sample. Hybridization and scanning may be carried out as described in PCT application No. WO 
92/10092 and WO 95/1 1995 and US patent No. 5,424,186. 

35 Thus, in some embodiments, the chips may comprise an array of nucleic acid sequences of 

fragments of about 15 nucleotides in length. Li further embodiments, the chip may comprise an 
array including at least one of the sequences selected from the group consisting of amplicons listed 
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in table 1 arid the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. In preferred embodiments the 
polymorphic base is within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, more 
5 preferably at the center of said polynucleotide. In some embodiments, the chip may comprise an 
array of at least 2, 3, 4, 5, 6, 7, 8 or more of these polynucleotides of the invention. Solid supports 
and polynucleotides of the present invention attached to solid supports are further described in 
"Oligonucleotide Probes And Primers". 
6) Integrated Systems 

10 Another technique, which may be used to analyze polymorphisms, includes multicomponent 

integrated systems, which miniaturize and compartmentalize processes such as PCR and capillary 
electrophoresis reactions in a single functional device. An example of such technique is disclosed in 
US patent 5,589,136 which describes the integration of PCR amplification and capillary 
electrophoresis in chips. 

15 Integrated systems can be envisaged mainly when microfluidic systems are used. These 

systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer 
included on a microchip. The movements of the samples are controlled by electric, electroosmotic 
or hydrostatic forces applied across different areas of the microchip to create functional microscopic 
valves and pumps with no moving parts. 

20 For genotyping biallelic markers, the microfluidic system may integrate nucleic acid 

amplification, microsequencing, capillary electrophoresis and a detection method such as laser- 
induced fluorescence detection. 

E. EXPRESSION OF AN OL1 TO OLF10 CODING POLYNUCLEOTIDE 

Any of the coding polynucleotides of the invention may be inserted into recombinant vectors 
25 for expression in a recombinant host cell or a recombinant host organism. 

Thus, the present invention also encompasses a family of recombinant vectors that contains 
a coding polynucleotide from the group of coding polynucleotides OLF1 to OLF10 genes. 
Consequently, the present invention further deals with a recombinant vector comprising a 
polynucleotide comprising any of the coding sequence of SEQ ID No 1, preferably those selected 
30 from the group consisting of SEQ ID Nos 2-1 1 . 

In a first preferred embodiment, the present invention relates to expression vectors which 
include nucleic acids encoding an olfactory receptor protein described herein under the control of an 
exogenous regulatory sequence. 

In a second preferred embodiment, a recombinant vector of the invention is used to amplify 
35 the inserted polynucleotide derived from an olfactory receptor genomic sequence selected from the 
group consisting of the nucleic acids of SEQ ID No 1 and of olfactory receptor cDNAs, for example 
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the open reading frames of SEQ ID Nos 2-1 1, in a suitable cell host , this polynucleotide being 
amplified at every time that the recombinant vector replicates. 

More particularly, the present invention relates to expression vectors which include nucleic 
acids encoding an olfactory receptor protein, preferably the olfactory receptor proteins of the amino 
5 acid sequence of SEQ ED Nos 1 2-2 1 or variants or fragments thereof, under the control of an 
exogenous regulatory sequence. 

Generally, a recombinant vector of the invention may comprise any of the polynucleotides 
described herein, including regulatory sequences, and coding sequences, as well as any olfactory 
receptor primer or probe as defined above. More particularly, the recombinant vectors of the present 
10 invention can comprise any of the polynucleotides described in the "Coding Regions of the olfactory 
receptor gene" section, "Genomic sequence of the olfactory receptor gene" section, the 
"Oligonucleotide Probes And Primers" section and the "Polynucleotide constructs" section. 

Some of the elements which can be found in the vectors of the present invention are 
described in further detail in the following sections. 

15 Vectors 

A recombinant vector according to the invention comprises, but is not limited to, a YAC 
(Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid, a 
cosmid, a plasmid or even a linear DNA molecule which may consist of a chromosomal, non- 
chromosomal and synthetic DNA. Such a recombinant vector can comprise a transcriptional unit 
20 comprising an assembly of 

(1) a genetic element or elements having a regulatory role in gene expression, for 
example promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 
10 to 300 bp that act on the promoter to increase the transcription. 

(2) a structural or coding sequence which is transcribed into mRNA and eventually 
25 translated into a polypeptide, and 

(3) appropriate transcription initiation and termination sequences. Structural units 
intended for use in yeast or eukaryotic expression systems preferably include a leader sequence 
enabling extracellular secretion of translated protein by a host cell. Alternatively, where 
recombinant protein is expressed without a leader or transport sequence, it may include an N- 

30 terminal residue. This residue may or may not be subsequently cleaved from the expressed 
recombinant protein to provide a final product. 

Generally, recombinant expression vectors will include origins of replication, selectable 
markers permitting transformation of the host cell, and a promoter derived from a highly expressed 
gene to direct transcription of a downstream structural sequence. The heterologous structural 
35 sequence is assembled in appropriate phase with translation initiation and termination sequences, 
and preferably a leader sequence capable of directing secretion of translated protein into the 
periplasmic space or extracellular medium. 
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The selectable marker genes for selection of transformed host cells are preferably 
dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, TRP1 for S. cerevisiae or 
tetracycline, rifampicine or ampicillin resistance in E. coli, or levan saccharase for mycobacteria. 
For facilitating the purification of the expressed protein and increasing its stability, the 
5 coding sequence of an olfactory receptor according to the invention can be fused in its N- or C- 
terminus with protein such as MBP (maltose binding protein) and GST (Glutathione S transferase) 
or with tag such as poly-histidine tag, Strep tag, Bio tag, and flag peptide epitope tag, those being 
detailed below. Thioredoxin can be eventually inserted between the olfactory receptor and the tag. 
Useful expression vectors for bacterial use are constructed by inserting a structural DNA 
10 sequence encoding a desired polypeptide with suitable translation initiation and termination signals 
in operable reading phase with a functional promoter. The vector will comprise one or more 
phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and 
to, if desirable, provide amplification within the host. 

As a representative but non-limiting example, useful expression vectors for bacterial use can 
15 comprise a selectable marker and bacterial origin of replication derived from commercially available 
plasmids comprising genetic elements of pBR322 (ATCC 37017). Such commercial vectors include, 
for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEM1 (Promega Biotec, Madison, WI, 
USA). 

Large numbers of suitable vectors and promoters are known to those of skill in the art, and 
20 commercially available, such as bacterial vectors : pQE70, pQE60, pQE-9 (Qiagen), pbs, pDIO, 
phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A (Stratagene); 
ptrc99a, pKK223-3, pKK233-3, pDR540, pRTTS (Pharmacia); or eukaryotic vectors : pWLNEO, 
pSV2CAT, pOG44, pXTl, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); 
baculovirus transfer vector pVL1392/1393 (Pharmingen); pQE-30 (QIAexpress). 
25 A suitable vector for the expression of the olfactory receptor above-defined or their peptide 

fragments is baculovirus vector that can be propagated in insect cells and in insect cell lines. A 
specific suitable host vector system is the pVLl 392/1 393 baculovirus transfer vector (Pharmingen) 
that is used to transfect the SF9 cell line (ATCC N°CRL 1711) which is derived from Spodoptera 
frugiperda. 

30 Other suitable vectors for the expression of an olfactory receptor or their peptide fragments 

or variants in a baculovirus expression system include those described by Chai et al. (1993), Vlasak 

et al. (1983) and Lenhard et al. (1996). 

Mammalian expression vectors will comprise an origin of replication, a suitable promoter 

and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and 
35 acceptor sites, transcriptional termination sequences, and 5 s flanking nontranscribed sequences. 

DNA sequences derived from the SV40 viral genome, for example SV40 origin, early promoter, 
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enhancer, splice and polyadenylation sites may be used to provide the required nontranscribed 
genetic elements. 

Promoters 

The suitable promoter regions used in the expression vectors according to the present 
5 invention are chosen taking into account of the cell host in which the heterologous gene has to be 
expressed. 

A suitable promoter may be heterologous with respect to the nucleic acid for which it 
controls the expression or alternatively can be endogenous to the native polynucleotide containing 
the coding sequence to be expressed. Additionally, the promoter is generally heterologous with 
10 respect to the recombinant vector sequences within which the construct promoter/coding sequence 
has been inserted. 

Thus, the promoter is selected among the group comprising : 

- an internal or an endogenous promoter, such as the natural promoter associated 
with the structural gene coding for the desired olfactory receptor polypeptide or the fragment or 

1 5 variant thereof; such a promoter may be completed by a regulatory element derived from the 
vertebrate host, in particular an activator element; 

- a promoter derived from a cytoskeletal protein gene such as the desmin promoter 
(Bolmont et ah, 1990; Zhenlin et ah, 1989) or a promoter derived from a gene specifically expressed 
in epithelial cells and most preferably in olfactory epithelial cells. 

20 Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA 

polymerase promoters, the polyhedrin promoter, or the plO protein promoter from baculovirus (Kit 
Novagen) (Smith et ah, 1983.; O'Reilly et ah, 1992), the lambda P R promoter or also the trc 
promoter. 

Promoter regions can be selected from any desired gene using, for example, CAT 
25 (chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 

Particularly preferred bacterial promoters include lacl, lacZ, T3, T7, gpt, lambda PR, PL and trp. 

Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, 

LTRs from retrovirus, and mouse metallothionein-L. Selection of a convenient vector and promoter 

is well within the level of ordinary skill in the art. 
30 The choice of a determined promoter, among the above-described promoters is well in the 

ability of one skill in the art, guided by his knowledge in the genetic engineering technical field, and 

by being also guided by the book of Sambrook et ah in 1 989 or also by the procedures described by 

Fuller et ah in 1996 (Fuller S.A. et ah, 1996). 

A preferred constitutive promoter that is used is one of the internal promoters that are active 
35 in the resting fibroblasts such the promoter of the phosphoglycerate kinase gene (PGK- 1 ). The PGK- 

1 promoter is either the mouse promoter or the human promoter such as described by Adra et ah( 
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2987). Other constitutive promoters may also be used such that the beta-actin promoter (Kort et al., 
1 983) or the vimentin promoter (Rettlez and Basenga, 1 987). 

The vector containing the appropriate DNA sequence as described above, more preferably a 
OLF1 to OLF10 coding polynucleotide, can be utilized to transform an appropriate host to allow the 
5 expression of the desired polypeptide or polynucleotide. 

Other types of vectors 

The in vivo expression of an olfactory receptor polypeptide encompassed by the invention or 
a fragment or a variant thereof may be useful in order to correct a genetic defect related to the 
expression of the native gene in a host organism or to the production of biologically active olfactory 
1 0 receptor proteins. 

Consequently, the present invention also deals with recombinant expression vectors mainly 
designed for the in vivo production of a therapeutic peptide fragment by the introduction of the 
genetic information in the organism of the patient to be treated. This genetic information may be 
introduced in vitro in a cell that has been previously extracted from the organism, the modified cell 
15 being subsequently reintroduced in the said organism, directly in vivo into the appropriate tissue, and 
preferably in the olfactory epithelium. 

One specific embodiment for a method for delivering the corresponding protein or peptide to 
the interior of a cell of a vertebrate in vivo comprises the step of introducing a preparation 
comprising a physiologically acceptable carrier and a naked polynucleotide operatively coding for 
20 the polypeptide into the interstitial space of a tissue comprising the cell, whereby the naked 
polynucleotide is taken up into the interior of the cell and has a physiological effect. 

In a specific embodiment, the invention provides a composition for the in vivo production of 
an olfactory receptor polypeptide described therein containing a naked polynucleotide operatively 
coding for an olfactory receptor selected from the group of OLF1 to OLF10 or a fragment or a 
25 variant thereof, in solution in a physiologically acceptable carrier and suitable for introduction into a 
tissue to cause cells of the tissue to express the said protein or polypeptide. 

Advantageously, the composition described above is administered locally, near the site in 
which the expression of the olfactory receptor polypeptide under consideration or a fragment or a 
variant thereof is sought. 

30 The polynucleotide operatively coding for an olfactory receptor polypeptide or a fragment or 

variant thereof may be a vector comprising the genomic DNA or the complementary DNA (cDNA) 
coding for the corresponding protein and a promoter sequence allowing the expression of the 
genomic DNA or the complementary DNA in the desired eukaryotic cells, such as vertebrate cells, 
specifically mammalian cells. 

35 This vector may also contain one origin of replication that allows it to replicate in the 

eukaryotic host cell such as an origin of replication from a bovine papillomavirus. Alternatively, the 
vector can contain several, for example two, origins of replication of different origins in order to 
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allow said vector to replicate in different host cells, typically both in a prokaryotic cell such as £. 
coli and in an eukaryotic ceil such as a mammalian epithelial cell, preferably a mammalian olfactory 
epithelial cell. 

Compositions comprising a polynucleotide are described in the PCT application N° WO 
5 90/1 1092 (Vical Inc.) and also in the PCT application N° WO 95/1 1307 (lnstitut Pasteur, INSERM, 
Universite d'Ottawa) as well as in the articles of Tacson et al. (1996) and of Huygen et al. (1996). 

In another embodiment, the DNA to be introduced is complexed with DEAE-dextran 
(Pagano et aL, 1967) or with nuclear proteins (Kaneda et al., 1 989), with lipids (Feigner et al., 1987) 
or encapsulated within liposomes (Fraley et al., 1980). 

10 In another embodiment, the polynucleotide encoding an olfactory receptor polypeptide of 

the invention or a fragment or a variant thereof may be included in a transfection system comprising 
polypeptides that promote its penetration within the host cells as it is described in the PCT 
application WO 95/10534 (Seikagaku Corporation). They can also be encapsulated in polymer 
microparticles as it is described in the PCT Application No WO 94/27238. 

15 The vector according to the present invention may advantageously be administered in the 

form of a gel that facilitates their transfection into the cells. Such a gel composition may be a 
complex of poly-L-lysine and lactose, as described by Midoux (1993) or also poloxamer 407 as 
described by Pastore (1994). Said vector* may also be suspended in a buffer solution or be associated 
with liposomes. 

20 The amount of the vector to be injected to the desired host organism vary according to the 

site of injection. As an indicative dose, it will be injected between 0, 1 and 1 00 \xg of the vector in an 
animal body, preferably a mammal body, for example a mouse body. 

hi another embodiment of the vector according to the invention, said vector may be 
introduced in vitro in a host cell, preferably in a host cell previously harvested from the animal to be 

25 treated and more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that 
has been transformed with the vector coding for the desired olfactory receptor polypeptide or the 
desired fragment or variant thereof is implanted back into the animal body in order to deliver the 
recombinant protein within the body either locally or systemically. 

Suitable vectors for the in vivo expression of an olfactory receptor polypeptide of the 

30 invention or a fragment or a variant thereof are described hereunder. 

In one specific embodiment, the vector is derived from an adenovirus. Preferred 
adenoviruses vectors according to the invention are those described by Feldman and Steg (1996) or 
Ohno et al. (1 994). Another preferred recombinant adenovirus according to this specific embodiment 
of the present invention is the adenovirus described by Ohwada et al. (1996) or the human 

35 adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal origin ( French patent application 
N° FR-93.05954). 
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Among the adenoviruses of animal origin it can be cited the adenoviruses of canine (CAV2, 
strain Manhattan or A26/61[ATCC VR-800]), bovine, murine (Mavl, Beard et al., 1980) or simian 
(SAV). 

Preferably, the inventors are using recombinant defective adenoviruses that may be prepared 
5 following a technique well-known by one skill in the art, for example as described by Levrero et al. 
(1991) or by Graham (1984) or in the European patent application N° EP-185.573. Another 
defective recombinant adenovirus that may be used according to the present invention, as well as a 
composition of matter containing such a defective recombinant adenovirus, is described in the PCT 
application N° WO 95/14785. 
10 Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 

recombinant gene delivery system of choice for the transfer of exogenous polynucleotides in vivo , 
particularly to mammals, including humans. These vectors provide efficient delivery of genes into 
cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host. 

The use of recombinant retrovirus vectors containing a nucleic acid according to the 
15 invention is also encompassed within the scope of the invention. A major prerequisite for the use of 
retroviruses is to ensure the safety of their use, particularly with regard to the possibility of the 
spread of wild-type virus in the cell population. The development of specialized cell lines (termed 
"packaging cells") which produce only replication defective retroviruses has increased the utility of 
retroviruses for in vivo gene delivery, and defective retroviruses are well characterized for use in 
20 gene transfer. Thus, recombinant retroviruses can be constructed in which a part of the retroviral 
coding sequence {gag, pol, env) has been replaced by nucleic acid encoding an olfactory receptor 
rendering the retrovirus defective. Protocols for producing recombinant retroviruses and for 
infecting cells in vitro and in vivo with such viruses can be found in "Current Protocols in Molecular 
Biology" (1989). 

25 Furthermore, it has been shown that it is possible to limit the infection spectrum of 

retroviruses and consequently of retroviral-based vectors, by modifying the viral packaging proteins 
on the surface of the viral particle, as described for example in the PCT Application No WO 
93/25234 or in the PCT Application No WO 94/ 06920. For instance, strategies for the modification 
of the infection spectrum of retroviral vectors include : coupling antibodies specific for cell surface 

30 antigens to the viral env protein (Julan et aL, 1992) or coupling cell surface receptor ligands to the 
viral env protein (Neda et ah, 1991). Coupling can be in the form of the chemical cross-linking with 
a protein or other variety (e.g. lactose to convert the env protein to an asialoglycoprotein), as well by 
generating fusion proteins (e.g. single-chain antibody/e/iv fusion proteins). This technique, while 
useful to limit or otherwise direct the infection to certain tissue types, can also be used to convert an 

35 ecotropic vector into an amphotropic vector. 

Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or 
in vitro gene delivery vehicles of the present invention include retroviruses selected from the group 
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consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus 
and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include 4070A and 1504A 
(Hartley et al., 1976), Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No 
VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR-190; 
5 PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include Bryan 
high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Another preferred 
retroviral vector is that described by Roth et al. (Roth LA. et al, 1996). 

Yet another viral vector system that is contemplated by the invention consists in the adeno- 
associated virus (AAV). Adeno-associated virus is a naturally occurring defective virus that requires 
10 another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient replication and a 
productive life cycle (Muzyczka et al., 1992). It is also one of the few viruses that may integrate its 
DNA into non-dividing cells, and exhibits a high frequency of stable integration (Flotte et al., 1992; 
Samulski et al., 1989; McLaughlin et al., 1989). One advantageous feature of AAV derives from its 
reduced efficacy for transducing primary cells relative to transformed cells. 

15 Cell hosts 

Another object of the invention consists in host cell that have been transformed or 
transfected with one of the polynucleotides described therein, and more precisely a polynucleotide 
comprising the coding sequence of any of the olfactory receptor polypeptide having the amino acid 
sequence of SEQ ID Nos 12-2 1 or fragments or variants thereof. Are included host cells that are 

20 transformed (prokaryotic cells) or that are transfected (eukaryotic cells) with a recombinant vector 
such as one of those described above. 

A recombinant host cell of the invention comprises any one of the polynucleotides or the 
recombinant vectors described therein. More particularly, the cell hosts of the present invention can 
comprise any of the polynucleotides described in the "Coding regions of the olfactory receptor gene" 

25 section, "Genomic sequence of olfactory receptor gene " section, the "Oligonucleotide Probes And 
Primers" section, the "Polynucleotide constructs" section.and the " Expression of an OLF1 to 
OLF 10 coding polypeptide" section. 

Suitable prokaryotic hosts for transformation include E. colU Bacillus subtilis y as well as 
various species within the genera of Streptomyces or Mycobacterium. Suitable eukaryotic hosts 

30 comprise yeast, insect cells, such as Drosophila and Sf9. Various mammalian cell hosts can also be 
employed to express recombinant protein. Examples of mammalian cell hosts include the COS-7 
lines of monkey kidney fibroblasts (Guzman, 1981), and other cell lines capable of expressing a 
compatible vector, for example the CI 27, 3T3, CHO, HeLa and BHK cell lines. The selection of an 
host is within the scope of the one skilled in the art. 

35 Preferred cell hosts used as recipients for the expression vectors of the invention are the 

followings : 

a) Prokaryotic host cells : Escherichia coli strains (I.E. DH5-a strain) or Bacillus subtilis. 
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b) Eukaryotic host cells : HeLa cells (ATCC N°CCL2; N°CCL2J; N°CCL2.2X Cv 1 cells 
(ATCC N°CCL70), COS cells (ATCC N°CRL1650; N°CRL1651), Sf-9 cells (ATCC N°CRL171 1). 

The constructs in the host cells can be used in a conventional manner to produce the gene 
product encoded by the recombinant sequence. 
5 Following transformation of a suitable host and growth of the host to an appropriate cell 

density, the selected promoter is induced by appropriate means, such as temperature shift or 
chemical induction, and cells are cultivated for an additional period. 

Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and 
the resulting crude extract retained for further purification. 
10 Microbial cells employed in expression of proteins can be disrupted by any convenient 

method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
agents. Such methods are well known by the skill artisan. 

Transgenic animals 

The terms "transgenic animals" or "host animals" are used herein designate animals that 
1 5 have their genome genetically and artificially manipulated so as to include one of the nucleic acids 
according to the invention. Preferred animals are non-human mammals and include those belonging 
to a genus selected from Mus (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. rabbits) which have 
their genome artificially and genetically altered by the insertion of a nucleic acid according to the 
invention. 

20 The transgenic animals of the invention all include within a plurality of their cells a cloned 

recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic 
acids comprising an olfactory receptor coding sequence selected from the group OLF1 to OLF10 an 
olfactory receptor regulatory polynucleotide or a DNA sequence encoding an antisense 
polynucleotide such as described in the present specification. 

25 More particularly, transgenic animals according to the invention contain in their somatic 

cells and/or in their germ line cells any of the polynucleotides described in the "Coding regions of 
the olfactory receptor gene" section, "Genomic sequence of olfactory receptor gene iT section, the 
"Oligonucleotide Probes And Primers" section, the "Polynucleotide constructs" section and the " 
Expression of an OLF 1 to OLF 1 0 coding polypeptide" section. 

30 The replacement of the native genomic olfactory receptor sequence by a defective copy of 

said sequence may be preformed by techniques of gene targeting. Such techniques are notably 
described by Burright et al. (1997), Bates et al. (1997), Mangiarini et ah (1997), Davies et al. (1997). 

Second preferred transgenic animals of the invention have the murine olfactory receptor 
gene replaced either by a defective copy of the murine olfactory receptor gene or by an interrupted 

35 copy of the human olfactory receptor gene. A "defective copy" of a murine or a human olfactory 
receptor gene, is intended to designate a modified copy of these genes that is not or poorly 
transcribed in the resulting recombinant host animal or a modified copy of these genes leading to the 
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absence of synthesis of the corresponding translation product or alternatively leading to a modified 
and/or truncated translation product lacking the biological activity of the wild type olfactory receptor 
protein. The altered translation product thus contains amino acid modifications, deletions and 
substitutions. Modifications and deletions may render the naturally occurring gene nonfunctional, 
5 thus leading to a "knockout animal". These transgenic animals are critical for the creation of animal 
models of human diseases, and for eventual treatment of disorders related to alteration of the 
olfactory perception of odorant substances or molecules. Examples of such knockout mice are 
described in the PCT Applications Nos WO 97/34641, WO 96/12792 and WO 98/02354. 

The endogenous murine olfactory receptor gene can be interrupted by the insertion, between 
10 two contiguous nucleotide of said gene, of a part of all of a marker gene placed under the control of 
the appropriate promoter, for example the endogenous promoter of the endogenous murine olfactory 
receptor gene. The marker gene may be the neomycin resistance gene (neo) that may be operably 
linked to the phosphoglycerate kinase-1 (PGK-1) promoter, as described in the PCT Application No 
WO 98/02534. 

15 Thus, the invention is also directed to a transgenic animal contain in their somatic cells 

and/or in their germ line cells a polynucleotide selected from the following group of 
polynucleotides: 

a) a defective copy of the human olfactory receptor gene; 

b) a defective copy of the endogenous olfactory receptor gene, wherein the expression 

20 "endogenous olfactory receptor gene" designates an olfactory receptor gene that is naturally present 
within the genome of the animal host to be genetically modified. 

The invention also concerns a method for obtaining transgenic animals, wherein said 
methods comprise the steps of : 

a) replacing the endogenous copy of the animal olfactory receptor gene by a nucleic acid 
25 selected from the group consisting of a defective copy of the human olfactory receptor gene and a 

defective copy of the endogenous olfactory receptor gene in animal cells, preferably embryonic stem 
cells (ES); 

b) introducing the recombinant animal cells obtained at step a) in embryos, notably 
blastocysts of the animal; 

30 c) selecting the resulting transgenic animals, for-example by detecting the defective copy of 

an olfactory receptor gene with one or several primers or probes according to the invention. 

Optionally, the transgenic animals may be bred together in order to obtain homozygous 
transgenic animals for the defective copy of the olfactory receptor gene introduced. 

The transgenic animals of the invention thus contain specific sequences of exogenous 
35 genetic material such as the nucleotide sequences described above in detail. 

In a preferred embodiment, these transgenic animals may be good experimental models in 
order to study the diverse pathologies related to disorders associated to alteration of the olfactory 
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perception of odorant substances or molecules, in particular concerning the transgenic animals 
within the genome of which has been inserted one or several copies of a polynucleotide encoding a 
native olfactory receptor protein, or alternatively a mutant olfactory receptor protein. 

Third preferred transgenic animals according to the invention contains in their somatic cells 
5 and/or in their germ line cells a polynucleotide selected from the following group of polynucleotides 

a) purified or isolated nucleic acid encoding an olfactory receptor polypeptide selected 
from OLF1 to OLF10, or a polypeptide fragment or variant thereof. 

b) a purified or isolated nucleic comprising at least 8 consecutive nucleotides of the 

10 nucleotide sequence SEQ ID No 1 , a nucleotide sequence complementary thereto or a fragment 
or a variant thereof; 

c) a purified or isolated nucleic acid comprising a nucleotide sequence selected from the 
group of SEQ ID 2-1 1, a sequence complementary thereto or a fragment or a variant thereof. 

The transgenic animals of the invention thus contain specific sequences of exogenous 
15 genetic material such as the nucleotide sequences described above in detail. 

In a first preferred embodiment, these transgenic animals may be good experimental models 
in order to screen the candidate substance of interest interacting with the olfactory receptor under 
consideration. 

Since it is possible to produce transgenic animals of the invention using a variety of different 

20 sequences, a general description will be given of the production of transgenic animals by referring 
generally to exogenous genetic material. This general description can be adapted by those skilled in 
the art in order to incorporate the DNA sequences into animals. For more details regarding the 
production of transgenic animals, and specifically transgenic mice, it may be referred to Sandou et 
al. (1994) and also to US Patents Nos 4,873,191, issued Oct. 10, 1989, 5,968,766, issued Dec. 16, 

25 1997 and 5,387,742, issued Feb. 28, 1995. 

Transgenic animals of the present invention are produced by the application of procedures 
which result in an animal with a genome that incorporates exogenous genetic material which is 
integrated into the genome. The procedure involves obtaining the genetic material, or a portion 
thereof, which encodes either a coding sequence, a non-coding polynucleotide or a DNA sequence 

30 encoding an antisense polynucleotide of an olfactory receptor selected from the group OLF1 to 
OLF10 such as described in the present specification. 

A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem cell 
line. The insertion is made using electroporation. The cells subjected to electroporation are screened 
(e.g. Southern blot analysis) to find positive cells which have integrated the exogenous recombinant 

35 polynucleotide into their genome. An illustrative positive-negative selection procedure that may be 
used according to the invention is described by Mansour et al. (1988). Then, the positive cells are 
isolated, cloned and injected into 3.5 days old blastocysts from mice. The blastocysts are then 
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inserted into a female host animal and allowed to grow to term. The offsprings of the female host 
are tested to determine which animals are transgenic e.g. include the inserted exogenous DNA 
sequence and which are wild-type. 

Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a 
5 recombinant expression vector or a recombinant host cell according to the invention. 

Recombinant Cell Lines Derived From The Transgenic Animals Of The Invention* 

A further object of the invention comprises recombinant host cells obtained from a 
transgenic animal described herein. In one embodiment the invention encompasses cells derived 
from non-human host mammals and animals comprising a recombinant vector of the invention or an 
10 olfactory receptor gene disrupted by homologous recombination with a knock out vector. 

Recombinant cell lines may be established in vitro from cells obtained from any tissue of a 
transgenic animal according to the invention, for example by transfection of primary cell cultures 
with vectors expressing ortc-genes such as SV40 large T antigen, as described by Chou (1989) and 
Shayet al.(1991). 

15 F. METHODS FOR SCREENING SUBSTANCES OR MOLECULES 
INTERACTING WITH AN OLFACTORY RECEPTOR PROTEIN 

The present invention pertains to methods for screening substances of interest, in particular 
odorant substances or molecules that interact with an olfactory receptor protein selected from the 
group consisting of OLF1 to OLF10, or one peptide fragment or variant thereof In one embodiment, 

20 the candidate substance is devoid of odorant propriety but it is able to bind the olfactory receptor and 
to trigger the transduction of signals. 

For the purpose of the present invention, a ligand means a molecule, such as a protein, a 
peptide, an antibody or any synthetic chemical compound capable of binding to the olfactory 
receptor protein or one of its fragments or variants or to modulate the expression of the 

25 polynucleotide coding for olfactory receptor or a fragment or variant thereof. 

In the ligand screening method according to the present invention, a biological sample or a 
defined molecule to be tested as a putative ligand of the olfactory receptor protein is brought into 
contact with the corresponding purified olfactory receptor protein, for example the corresponding 
purified recombinant olfactory receptor protein produced by a recombinant cell host as described 

30 herein, in order to form a complex between this protein and the putative ligand molecule to be tested. 
As an illustrative example, to study the interaction of the olfactory receptor protein, or a 
fragment comprising comprising any of the fragments described in the section "OLFi to OLF10 
proteins and polypeptide fragments" with drugs or small molecules, such as molecules generated 
through combinatorial chemistry approaches, the microdialysis coupled to HPLC method described 

35 by Wang et al. (1997) or the affinity capillary electrophoresis method described by Bush et al. 
(1997) can be used. 
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In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which 
interact with the olfactory receptor protein, or a fragment comprising any of the fragments described 
in the section "OLF 1 to OLF1 0 proteins and polypeptide fragments" may be identified using assays 
such as the following. The molecule to be tested for binding is labeled with a detectable label, such 
5 as a fluorescent , radioactive, or enzymatic tag and placed in contact with immobilized olfactory 
receptor protein, or a fragment thereof under conditions which permit specific binding to occur, such 
as affinity columns. In some embodiments, chimeric proteins containing the olfactory receptor 
protein fused to proteins facilitating purification, such as glutathion S transferase (GST) are used. 
After removal of non-specifically bound molecules, bound molecules are detected using appropriate 
10 means. 

In one embodiment, proteins, peptides, carbohydrates, lipids, or small molecules generated 
by combinatorial chemistry interacting with the olfactory receptor protein, or a fragment or a variant 
thereof can also be screened by using an Optical Biosensor as described in Edwards and 
Leatherbarrow (1997) and also in Szabo et al. (1995). The main advantage of the method is that it 
15 allows the determination of the association rate between the olfactory receptor protein and molecules 
interacting with the olfactory receptor protein. It is thus possible to select specifically ligand 
molecules interacting with the olfactory receptor protein, or a fragment thereof, through strong or 
conversely weak association constants. 

Another object of the present invention comprises methods and kits for the screening of 
20 candidate substances that interact with olfactory receptor polypeptide. 

The present invention pertains to methods for screening substances of interest that interact 
with an olfactory receptor protein or one fragment or variant thereof. By their capacity to bind 
covalently or non-covalently to an olfactory receptor protein or to a fragment or variant thereof, 
these substances or molecules may be advantageously used both in vitro and in vivo. In vitro, said 
25 interacting molecules may be used as detection means in order to identify the presence of an 
olfactory receptor protein in a sample, preferably a biological sample. 

A first method for the screening of a candidate substance interacting with an olfactory 
receptor polypeptide selected from the group consisting of SEQ ID Nos 12-21, or fragments or 
variants thereof, comprises the following steps : 
30 a) providing a polypeptide selected from the group consisting of the polypeptides 

comprising, consisting essentially of, or consisting of the amino acid sequences of SEQ ED 
Nos 12-21 , or a peptide fragment or a variant thereof; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; and 

35 d) detecting the complexes formed between said polypeptide and said candidate 

substance. 
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Various candidate substances or molecules can be assayed for interaction with an olfactory 
receptor polypeptide. These substances or molecules include, without being limited to, natural or 
synthetic organic compounds or molecules of biological origin such as polypeptides. When the 
candidate substance or molecule comprises a polypeptide, this polypeptide may be the resulting 
5 expression product of either a phage clone belonging to a phage-based random peptide library, or of 
a cDNA library cloned in a vector suitable for performing a two-hybrid screening assay. 

In one embodiment of the screening method defined above, the complexes formed between 
the polypeptide and the candidate substance are further incubated in the presence of a polyclonal or a 
monoclonal antibody that specifically binds to the olfactory receptor protein of the invention under 

1 0 consideration or to said peptide fragment or variant thereof. 

In another embodiment of the present screening method, increasing concentrations of a 
substance competing for binding to the olfactory receptor with the considered candidate substance is 
added, simultaneously or prior to the addition of the candidate substance or molecule, when 
performing step c) of said method. By this technique, the detection and optionally the quantification 

15 of the complexes formed between the olfactory receptor protein or the peptide fragment or variant 
thereof and the candidate substance or molecule to be screened allows the one skilled in the art to 
determine the affinity value of said substance or molecule for said olfactory receptor protein or the 
peptide fragment or variant thereof. 

The olfactory receptor selected from the group consisting of OLF1 to OLF10, or a peptide 

20 fragment or a variant thereof, can be overexpressed and purified in a bacterial system such as E coli 
as described in Kiefer et al. (1996) and Tucker et al. (1996). The olfactory receptor coding sequence 
can be fused to its N-terminus with GST (Glutathione S transferase) or MBP (Maltose Binding 
Protein) and to its C-terminus with poly-histidine tag, Bio tag or Strep tag for facilitating the 
purification of the expressed protein. The Bio tag is 13 amino acid residues long, is biotinylated in 

25 vivo in E* coli, and will therefore bind to both avidin and streptavidin. The Strep tag is 9 amino acid 
residues long and binds specifically to streptavidin, but not to avidin. Therewith, a purification step 
by affinity can be carried out based on the interaction of a poly-histidine tail with immobilized metal 
ions, of the biotinylated Bio tag with monomeric avidin, of the Strep tag with streptavidin, of the 
GST segment with the glutathione, or of the MBP segment with the maltose. Thioredoxin can be 

30 eventually inserted between the receptor C-terminus and the tag and could increase the expression 
level. The fusion protein is solubilized in 1% N-laurroyl sarcosine, and 0.2 % digitonin is added. It is 
purified by affinity chromatography. The MBP, GST or tag segment can be then removed. After the 
olfactory receptor protein purification, sarcosyl can be replaced with digitonin which is a detergent 
widely used to stabilize the G protein-coupled receptors. The purified receptor is reconstituted into 

35 lipid vesicles preferably composed of phosphatidylcholine: phosphatidylglycerol (4: 1) by adding the 
lipid dissolved in dodecyl maltoside and removing the detergent. 
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The olfactory receptor selected from the group consisting of OLF1 to OLF10, or a peptide 
fragment or a variant thereof, can also be overexpressed and purified in a bacuiovirus/Sfl) system as 
described in Nekrasova et al. (1996). The olfactory receptor gene, or a fragment thereof, is 
preferably expressed with a "flag" peptide epitope tag and/or a poiy-histidine tag to either its N- or 
5 C-terminus for facilitating the purification of the expressed protein. Therefore, the olfactory receptor 
gene, or a fragment or a variant thereof, is preferably subcloned into the baculovirus transfer vector 
pAcSGHisNT to create constructs that encoded olfactory receptor with amino-terminal poly- 
histidine tag. The resulting transfer vector is transfected preferably with BaculoGold DNA into Sf9 
cells. The expressed olfactory receptors are then solubili2ed either in 1 % N-lauryl sarcosine or 1.5 
10 % lysophosphatidylcholine, but preferably in lysophosphatidylcholine. After solubilization, the 
olfactory receptors are purified by affinity chromatography on nickel nitrilotriacetic acid resin and 
by cation-exchange chromatography with carboxymethyl sepharaose cation-exchange column. The 
tag segment can be then removed. The purified receptor is reconstituted into lipid vesicles preferably 
composed of dimyritoylglycerophosphocholine, cholesterol, dialmitoylgycerophosphoserine and 
1 5 dipalmitoylglycerophosphoethanolamine (in molecular ratio 54:35: 1 0: 1 ) 

Once the olfactory receptor protein or one of its peptide fragments or variants has been 
obtained as described above, candidate substances or molecules can then be assayed for their 
capacity to bind thereto. 

The candidate substance or molecule to be assayed for interacting with an olfactory receptor 
20 of the invention may be of diverse nature, including, without being limited to, natural or synthetic 
organic compounds or molecules of biological origin such as peptide. It can comprise aromatic or 
aliphatic compounds with various functional groups such as alcohol, aldehyde, ester, ether, ketone, 
carboxylic, amine. An example of a substance panel which can be used is provided by Zhao et al. 
(1998). 

25 The screening of substances or molecules interacting with an olfactory receptor, or a 

fragment thereof, is carried out by photoaffinity labeling experiments described in Kiefer et al. 
(1996). The odorant is labeled, preferably radiolabeled, and incubated with lipid vesicles including 
the purified olfactory receptor. The odorants bound to the olfactory receptors are crosslinked by 
exposure to ultraviolet light. Then, the samples are subjected to SDS polyacrylamide gel 

30 electrophoresis. Proteins are visualized by Coomassie-blue staining and the odorants are revealed, 
preferably by autoradiography. In another embodiment, the proteins can be visualized by Western 
Blot with a polyclonal or monoclonal antibody that specifically binds to the olfactory receptor under 
consideration. Once a substance binding to the considered olfactory receptor is identified, the 
binding specificity of this substance is confirmed with competition experiments demonstrating that 

35 increasing concentrations of unlabeled ligand accomplish a dose-dependent displacement of the 
radioactive ligand. 
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The identification of a first substance specific to one of the olfactory receptors of the present 
invention facilitates the screening of other substances. Indeed, the binding capacity of the screened 
substances to this olfactory receptor can be carried out through a competition experiments against 
the first identified substance which is labeled. 
5 The invention also pertains to kits useful for performing the hereinbefore described 

screening method. Preferably, such kits comprise a polypeptide selected form the group consisting of 
the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21 or a peptide fragment or a 
variant thereof, and optionally means useful to detect the complex formed between the considered 
olfactory receptor polypeptide or its peptide fragment or variant and the candidate substance. In a 
1 0 preferred embodiment, the kit can comprise an already identified substance specific of the olfactory 
receptor under consideration which is labeled, preferably radiolabeled, and a monoclonal or 
polyclonal antibody directed against the considered olfactory receptor. 

A second screening method embodiment consists of a method for the screening of ligand 
molecules interacting with an olfactory receptor polypeptide selected from the group consisting of 
1 5 SEQ ID Nos 12-21, wherein said method comprises : 

a) providing a recombinant eukaryotic host cell containing a nucleic acid encoding a 
polypeptide selected from the group comprising, consisting essentially of, or consisting the 
polypeptides comprising the amino acid sequences SEQ ID Nos 12-21 , or variants or 
fragments thereof; 

20 b) preparing membrane extracts of said recombinant eukaryotic host cell; 

c) bringing into contact the membrane extracts prepared at step b) with a selected 
ligand molecule; and 

d) detecting the production level of second messengers metabolites. 

The bacuIovirus-Sf9 cell system enables a foreign DNA encoding an olfactory receptor 
25 selected from the group consisting of OLF1 to OLF1 0, or a peptide fragment or a variant thereof, to 
be expressed with high efficiency. Moreover, it can be used to couple a heterologous expressed 
olfactory receptor to the second messenger cascades. Therefore, the binding specificity of an 
olfactory receptor can be assessed through an assay of odorant-induced generation of cAMP or 
inositol triphosphate (InsP3) described in Raming et al. (1993). 
30 Briefly, a cell line derived from Sf9 is infected by baculovirus, such as baculovirus transfer 

vector pVL1393, harboring DNA encoding the olfactory receptor or a fragment thereof downstream 
from a strong promoter, preferably the polyhedrin promoter. Recombinant virus are purified and 
used to infect 1 .5 x 10 8 Sf9 cells in 100 ml spinner cultures at high multiplicity of infection. Cells are 
collected after a postinfection delay, preferably 48 h, and membrane fractions are isolated as follow. 
35 Cells are pelleted (at 250g for 10 min at 4°C), washed with Ringer solution (120 mM NaCl, 

5 mM KC1, 1.6 mM K 2 HP0 4 , 1.2 mM MgS0 4 , 25 mM NaHC0 3 , 5 mM glucose, pH7.4) and 
disrupted using a glass homogenizer in homogenization buffer (10 mM Tris-HCl, pH 8.0, 2 mM 
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EGTA, 3 niM MgC12) containing antiproteases. The homogenate is centrifuged and the pellet is 
washed. Supernatants are centrifuged at 33,000g for 20 min. The final pellet is resuspended in 
homogenization buffer and the protein concentration is determined. 

Assay of odorant substance-induced generation of second messengers cAMP and InsP3 is 
5 performed as follow. Suspensions of Sf9 cell membrane preparations (300 ug protein) are rapidly 
mixed with a stimulation buffer (200 mM NaCl, 10 mM EGTA, 50 mM MOPS, 2.5 mM MgCl 2 , 1 
mM DTT, 0.05 % Na-choiate, 1 mM ATP, 1 uM GTP, and 0.02 uM free Ca 3 ~) containing the 
candidate substances at the appropriate concentrations. The reaction is stopped after a short time, 
preferably 1 sec, by injecting 1 0 % Perchloric acid. Quenched samples are assayed for second 
10 messenger concentrations. The quenched and cooled samples are vortexed and centrifuged for 5 min 
at 2500g at 4°C. 400 ul of the supernatants are transferred to a separate tube containing 1 00 ul of 1 0 
mM EDTA (pH 7). The sample are neutralized by adding 500 ul of a 1 : 1 (v/v) mixture of 1 , 1 ,2 
trichlorofluoroethane, followed by thorough mixing. After centrifugation for 2 min at 500g, three 
phases are obtained. The upper phase, which contains all water soluble components, is used for 
15 carrying out the concentration measurements. cAMP and InsP3 concentrations are determined 
according the procedure of Steiner et aL (1972) and Palmer et al. (1989), respectively. 

The invention also concerns a kit for the screening of odorant ligand molecules interacting 
with an olfactory receptor polypeptide selected from the group consisting of the polypeptides 
comprising the amino acid sequences SEQ ID Nos 12-21, wherein said kit comprises : 
20 a) a recombinant eukaryotic host cell containing a nucleic acid encoding a 

polypeptide selected from the group comprising, consisting essentially of, or consisting of 
the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21 or variants or 
fragments thereof; and 

b) optionally, reagents necessary for the measurement of second messenger 
25 metabolites in a sample. 

The screening of substances or molecules interacting with an olfactory receptor, or a 
fragment thereof, can also be carried out through the measurement of the increase of the response to 
odorants in an olfactory epithelium overexpressing an olfactory receptor selected from the group 
consisting of OLF1 to OLF10, or a peptide fragment or a variant thereof, as described in Zhao et al. 
30 (1 998). The response is assessed by electro-olfactogram which measures a transepithelial potential 
resulting from the summed activity of many olfactory neurons. In order to overexpress the olfactory 
receptor, or a fragment thereof, in an olfactory epithelium, an adenovirus containing the olfactory 
receptor gene is generated. To aid in electro-olfactogram electrode placements, the olfactory 
receptor coding sequence is preferably combined in the adenovirus with the physiological marker 
35 green fluorescent protein (GFP) in such manner that the two proteins are simultaneously expressed. 
The olfactory epithelium of an animal, preferably of a rat, is infected by the adenovirus. Animals are 
killed 3 to 8 days after infection and the nasal cavity is opened, exposing the medial surface of the 
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nasal turbinates. Under fluorescent illumination, the GFP clearly marked the pattern of viral 
infection and olfactory receptor expression. Odorant substance are applied to the olfactory 
epithelium in the vapor phase by injecting a pressurized pulse of odorant vapor into a continuous 
stream of humidified clean air. Electro-olfactogram recordings are obtained with a glass capillary 
5 electrode placed on the surface of the epithelium and connected to a differential amplifier. The 
olfactory receptor specificity is assessed from the increase of response in infected animals compared 
to uninfected animals. To account for the variability between animals, a standard odorant to which 
all other odorant responses are normalized is used. 

A third screening method embodiment consists of a method for the screening of ligand 
10 molecules interacting with an olfactory receptor polypeptide selected from the group consisting of 
the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21, wherein said method 
comprises : 

a) providing an adenovirus containing a nucleic acid encoding a polypeptide selected 
from the group comprising, consisting essentially of, or consisting of the polypeptides 

15 comprising the amino acid sequences SEQ ID Nos 12-21, or variants or fragments thereof; 

b) infecting an olfactory epithelium with said adenovirus; 

c) bringing into contact the olfactory epithelium b) with a selected ligand molecule; 

and 

d) detecting the increase of the response to said ligand molecule. 

20 G. METHODS FOR INHIBITING THE EXPRESSION OF AN OLFACTORY 

RECEPTOR GENE 

Other therapeutic compositions according to the present invention comprise advantageously 
an oligonucleotide fragment of the nucleic sequence of olfactory receptor as an antisense tool or a 
triple helix tool that inhibits the expression of the corresponding olfactory receptor gene. A 
25 preferred fragment of the nucleic sequence of olfactory receptor comprises an allele of at least one of 
the biallelic markers Al to A13. 

Antisense Approach 

Preferred methods using antisense polynucleotide according to the present invention are the 
procedures described by Sczakiel et al.(1995). 
30 Preferred antisense polynucleotides are described in the section entitled "Nuclear Antisense 

DNA Constructs". 

The antisense nucleic acids should have a length and melting temperature sufficient to 
permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the 
olfactory receptor mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for 
35 use in gene therapy are disclosed in Green et al., (1986) and Izant and Weintraub, (1984). 
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In some strategies, antisense molecules are obtained by reversing the orientation of the 
olfactory receptor coding region with respect to a promoter so as to transcribe the opposite strand 
from that which is normally transcribed in the cell. The antisense molecules may be transcribed 
using in vitro transcription systems such as those which employ T7 or SP6 polymerase to generate 
5 the transcript. Another approach involves transcription of olfactory receptor antisense nucleic acids 
in vivo by operably linking DNA containing the antisense sequence to a promoter in a suitable 
expression vector. 

Alternatively, suitable antisense strategies are those described by Rossi et aL(1991), in the 
International Applications Nos. WO 94/23026, WO 95/04141, WO 92/18522 and in the European 

10 Patent Application No. EP 0 572 287 A2 

An alternative to the antisense technology that is used according to the present invention 
comprises using ribozymes that will bind to a target sequence via their complementary 
polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing its target site 
(namely "hammerhead ribozymes"). Briefly, the simplified cycle of a hammerhead ribozyme 

15 comprises (1) sequence specific binding to the target RNA via complementary antisense sequences; 
(2) site-specific hydrolysis of the cleavable motif of the target strand; and (3) release of cleavage 
products, which gives rise to another catalytic cycle. Indeed, the use of long-chain antisense 
polynucleotide (at least 30 bases long) or ribozymes with long antisense arms are advantageous. A 
preferred delivery system for antisense ribozyme is achieved by covalently linking these antisense 
20 ribozymes to lipophilic groups or to use liposomes as a convenient vector. Preferred antisense 
ribozymes according to the present invention are prepared as described by Sczakiel et al.(1995), the 
specific preparation procedures being referred to in said article. 

Triple Helix Approach 

The olfactory receptor genomic DNA may also be used to inhibit the expression of the 
25 olfactory receptor gene based on intracellular triple helix formation. 

Triple helix oligonucleotides are used to inhibit transcription from a genome. They are 
particularly useful for studying alterations in cell activity when it is associated with a particular 
gene. 

Similarly, a portion of the olfactory receptor genomic DNA can be used to study the effect 
30 of inhibiting olfactory receptor transcription within a cell. Traditionally, homopurine sequences 
were considered the most useful for triple helix strategies. However, homopyrimidine sequences can 
also inhibit gene expression. Such homopyrimidine oligonucleotides bind to the major groove at 
homopurinerhomopyrimidine sequences. Thus, both types of sequences from the olfactory receptor 
genomic DNA are contemplated within the scope of this invention. 
35 To carry out gene therapy strategies using the triple helix approach, the sequences of the 

olfactory receptor genomic DNA are first scanned to identify 10-mer to 20-mer homopyrimidine or 
homopurine stretches which could be used in triple-helix based strategies for inhibiting olfactory 
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receptor expression. Following identification of candidate homopyrimidine or homopurine 
stretches, their efficiency in inhibiting olfactory receptor expression is assessed by introducing 
varying amounts of oligonucleotides containing the candidate sequences into tissue culture cells 
which express the olfactory receptor gene. 
5 The oligonucleotides can be introduced into the cells using a variety of methods known to 

those skilled in the art, including but not limited to calcium phosphate precipitation, DEAE-Dextran, 
electroporation, liposome-mediated transfection or native uptake. 

Treated cells are monitored for altered cell function or reduced olfactory receptor expression 
using techniques such as Northern blotting, RNase protection assays, or PCR based strategies to 
10 monitor the transcription levels of the olfactory receptor gene in cells which have been treated with 
the oligonucleotide. 

The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells 
may then be introduced in vivo using the techniques described above in the antisense approach at a 
dosage calculated based on the in vitro results, as described in antisense approach. 
15 In some embodiments, the natural (beta) anomers of the oligonucleotide units can be 

replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an 
intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha 
oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides 
suitable for triple helix formation see Griffin et al.(1989). 

20 H. COMPUTER-RELATED EMBODIMENTS 

As used herein the term "nucleic acid codes of the invention" encompass the nucleotide 
sequences comprising, consisting essentially of, or consisting of any of the polynucleotides 
described in the "Coding Regions of the olfactory receptor gene" section, "Genomic sequence of the 
olfactory receptor gene" section and the "Oligonucleotide Probes And Primers" section, or variants 

25 thereof, or complementary sequences thereto. Homologous sequences refer to a sequence having at 
least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, or 75% homology to these contiguous spans. 
Homology may be determined using any method described herein, including BLAST2N with the 
default parameters or with any modified parameters. Homologous sequences also may include RNA 
sequences in which uridines replace IliQ thymines in the nucleic acid codes of the invention. 

30 As used herein the term "polypeptide codes of the invention" encompass the polypeptide 

sequences comprising any of the polypeptides described in the " OLF1 to OFL10 proteins and 
polypeptide fragments". 

It will be appreciated that the nucleic acid and polypeptide codes of the invention can be 
represented in the traditional single character format or three letter format respectively (See the inside 

35 back cover of Stryer, Lubert. Biochemistry* 3 rd edition. W. H Freeman & Co., New York.) or in any 



BNSDOCID;<WO O021985A2 I ? 



WO 00721985 PCMB99/01729 

74 

other format or code which records the identity of the nucleotides or the amino acid respectively in a 
sequence. 

It will be appreciated by those skilled in the art that the nucleic acid codes of the invention and 
polypeptide codes of the invention can be stored, recorded, and manipulated on any medium which can 
5 be read and accessed by a computer. As used herein, the words "recorded" and "stored" refer to a 
process for storing information on a computer medium. A skilled artisan can readily adopt any of the 
presently known methods for recording information on a computer readable medium to generate 
manufactures comprising one or more of the nucleic acid codes of the invention, or one or more of the 
polypeptide codes of the invention. Another aspect of the present invention is a computer readable 
10 medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 nucleic acid codes of the 
invention. Another aspect of the present invention is a computer readable medium having recorded 
thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 polypeptide codes of the invention. 

Computer readable media include magnetically readable media, optically readable media, 
electronically readable media and magnetic/optical media. For example, the computer readable media 
1 5 may be a hard disc, a floppy disc, a magnetic tape, CD-ROM, DVD, RAM, or ROM as well as other 
types of other media known to those skilled in the art. 

Embodiments of the present invention include systems, particularly computer systems which 
contain the sequence information described herein. As used herein, "a computer system" refers to the 
hardware components, software components, and data storage components used to store and/or analyze 
20 the nucleotide sequences of the nucleic acid codes of the invention, the amino acid sequences of the 
polypeptide codes of the invention, or other sequences. The computer system preferably includes the 
computer readable media described above, and a processor for accessing and manipulating the sequence 
data. 

In some embodiments, the computer system may further comprise a sequence comparer for 
25 comparing the nucleic acid codes or polypeptide codes of the invention stored on a computer readable 
medium to reference nucleotide sequences stored on a computer readable medium. A "sequence 
comparer" refers to one or more programs which are implemented on the computer system to compare a 
nucleotide or polypeptide sequence with other nucleotide or polypeptide sequences and/or compounds 
including but not limited to peptides, peptidomimetics, and chemicals the sequences or structures of 
30 which are stored within the data storage means. For example, the sequence comparer may compare the 
nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the 
polypeptide codes of the invention stored on a computer readable medium to reference sequences stored 
on a computer readable medium to identify homologies, motifs implicated in biological function, or 
structural motifs. The various sequence comparer programs identified elsewhere in this patent 
35 specification are particularly contemplated for use in this aspect of the invention. 

Accordingly, one aspect of the present invention is a computer system comprising a 
processor, a data storage device having stored thereon a nucleic acid code of the invention or a 
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polypeptide code of the invention, a data storage device having retrievably stored thereon reference 
nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of the 
invention or polypeptide code of the invention and a sequence comparer for conducting the 
comparison. The sequence comparer may indicate a homology level between the sequences 
5 compared or identify structural motifs in the nucleic acid code of the invention and polypeptide 
codes of the invention or it may identify structural motifs in sequences which are compared to these 
nucleic acid codes and polypeptide codes. In some embodiments, the data storage device may have 
stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 
invention or polypeptide codes of the invention. 

10 Another aspect of the present invention is a method for determining the level of homology 

between a nucleic acid code of the invention and a reference nucleotide sequence, comprising the 
steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a 
computer program which determines homology levels and determining homology between the nucleic 
acid code and the reference nucleotide sequence with the computer program. The computer program 

1 5 may be any of a number of computer programs for determining homology levels, including those 
specifically enumerated herein, including BLAST2N with the default parameters or with any modified 
parameters. The method may be implemented using the computer systems described above. The 
method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described nucleic 
acid codes of the invention through the use of the computer program and determining homology 

20 between the nucleic acid codes and reference nucleotide sequences. 

Alternatively, the computer program may be a computer program which compares the 
nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide 
sequences in order to determine whether the nucleic acid code of the invention differs from a reference 
nucleic acid sequence at one or more positions. Optionally such a program records the length and 

25 identity of inserted, deleted or substituted nucleotides with respect to the sequence of either the 

reference polynucleotide or the nucleic acid code of the invention. In one embodiment, the computer 
program may be a program which determines whether the nucleotide sequences of the nucleic acid 
codes of the invention contain one or more single nucleotide polymorphisms (SNP) with respect to a 
reference nucleotide sequence. These single nucleotide polymorphisms may each comprise a single 

30 base substitution, insertion, or deletion. 

Another aspect of the present invention is a method for determining the level of homology 
between a polypeptide code of the invention and a reference polypeptide sequence, comprising the 
steps of reading the polypeptide code of the invention and the reference polypeptide sequence through 
use of a computer program which determines homology levels and determining homology between the 

35 polypeptide code and the reference polypeptide sequence using the computer program. 

Accordingly, another aspect of the present invention is a method for determining whether a 
nucleic acid code of the invention differs at one or more nucleotides from a reference nucleotide 
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sequence comprising the steps of reading the nucleic acid code and the reference nucleotide 
sequence through use of a computer program which identifies differences between nucleic acid 
sequences and identifying differences between the nucleic acid code and the reference nucleotide 
sequence with the computer program. In some embodiments, the computer program is a program 
5 which identifies single nucleotide polymorphisms. The method may be implemented by the 

computer systems described above. The method may also be performed by reading at least 2, 5, 10, 
15, 20, 25, 30, or 50 of the nucleic acid codes of the invention and the reference nucleotide 
sequences through the use of the computer program and identifying differences between the nucleic 
acid codes and the reference nucleotide sequences with the computer program. 
10 An "identifier" refers to one or more programs which identifies certain features within the 

above-described nucleotide sequences of the nucleic acid codes of the invention or the amino acid 
sequences of the polypeptide codes of the invention. 

In one embodiment, the identifier may comprise a molecular modeling program which 
determines the 3-dimensional structure of the polypeptides codes of the invention. In some 
15 embodiments, the molecular modeling program identifies target sequences that are most compatible 
with profiles representing the structural environments of the residues in known three-dimensional 
protein structures. (See, e.g., Eisenberg et aL, U.S. Patent No. 5,436,850 issued July 25, 1995). In 
another technique, the known three-dimensional structures of proteins in a given family are 
superimposed to define the structurally conserved regions in that family. This protein modeling 
20 technique also uses the known three-dimensional structure of a homologous protein to approximate 
the structure of the polypeptide codes of the invention. (See e.g., Srinivasan, et al., U.S. Patent 
No. 5,557,535 issued September 17, 1996). Conventional homology modeling techniques have been 
used routinely to build models of proteases and antibodies. (Sowdhamini et al., (1997)). 
Comparative approaches can also be used to develop three-dimensional protein models when the 
25 protein of interest has poor sequence identity to template proteins. In some cases, proteins fold into 
similar three-dimensional structures despite having very weak sequence identities. For example, the 
three-dimensional structures of a number of helical cytokines fold in similar three-dimensional 
topology in spite of weak sequence homology. 

The recent development of threading methods now enables the identification of likely 
30 folding patterns in a number of situations where the structural relatedness between target and 
template(s) is not detectable at the sequence level. Hybrid methods, in which fold recognition is 
performed using Multiple Sequence Threading (MST), structural equivalencies are deduced from the 
threading output using a distance geometry program DRAGON to construct a low resolution model, 
and a full-atom representation is constructed using a molecular modeling package such as 
35 QUANTA. According to this 3-step approach, candidate templates are first identified by using the 
novel fold recognition algorithm MST, which is capable of performing simultaneous threading of 
multiple aligned sequences onto one or more 3-D structures. In a second step, the structural 
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equivalencies obtained from the MST output are converted into interresidue distance restraints and 
fed into the distance geometry program DRAGON, together with auxiliary information obtained 
from secondary structure predictions. The program combines the restraints in an unbiased manner 
and rapidly generates a large number of low resolution model confirmations. In a third step, these 
5 low resolution model confirmations are converted into full-atom models and subjected to energy 
minimization using the molecular modeling package QUANTA. (See e.g., Aszodi et al., (1997)). 

he results of the molecular modeling analysis may then be used in rational drug design 
techniques to identify agents which modulate the activity of the polypeptide codes of the invention. 
Accordingly, another aspect of the present invention is a method of identifying a feature 

10 within the nucleic acid codes of the invention or the polypeptide codes of the invention comprising 
reading the nucleic acid code(s) or the polypeptide code(s) through the use of a computer program 
which identifies features therein and identifying features within the nucleic acid code(s) or 
polypeptide code(s) with the computer program. In one embodiment, computer program comprises a 
computer program which identifies open reading frames. In a further embodiment, the computer 

1 5 program identifies structural motifs in a polypeptide sequence. In another embodiment, the 
computer program comprises a molecular modeling program. The method may be performed by 
reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 
invention or the polypeptide codes of the invention through the use of the computer program and 
identifying features within the nucleic acid codes or polypeptide codes with the computer program. 

20 The nucleic acid codes of the invention or the polypeptide codes of the invention may be 

stored and manipulated in a variety of data processor programs in a variety of formats. For example, 
they may be stored as text in a word processing file, such as Microsoft WORD or WORDPERFECT or 
as an ASCII file in a variety of database programs familiar to those of skill in the art, such as DB2, 
SYBASE, or ORACLE. In addition, many computer programs and databases may be used as sequence 

25 comparers, identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to 
the nucleic acid codes of the invention or the polypeptide codes of the invention. The following list is 
intended not to limit the invention but to provide guidance to programs and databases which are useful 
with the nucleic acid codes of the invention or the polypeptide codes of the invention. The programs 
and databases which may be used include, but are not limited to: MacPattern (EMBL), DiscoveryBase 

30 (Molecular Applications Group), GeneMine (Molecular Applications Group), Look (Molecular 
Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), 
BLASTN and BLASTX (Altschul et al, 1990), FASTA (Pearson and Lipman, 1988), FASTDB 
(Brutlag et al., 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations 
Inc.), Cerius 2 DBAccess (Molecular Simulations Inc.), HypoGen (Molecular Simulations Inc.), Insight 

35 H (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular 
Simulations Inc.), Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc.), 
QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler 
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(Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular 
Simulations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular 
Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), 
the EMBL/Swissprotein database, the MDL Available Chemicals Directory database, the MDL Drug 
5 Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug 
Index database, the BioByteMasterFile database, the Genbank database, and the Genseqn database. 
Many other programs and data bases would be apparent to one of skill in the art given the present 
disclosure. 

Motifs which may be detected using the above programs include sequences encoding 
10 leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and 
beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded 
proteins, sequences implicated in transcription regulation such as homeoboxes, acidic stretches, 
enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 

15 Throughout this application, various publications, patents and published patent applications 

are cited. The disclosures of these publications, patents and published patent specification 
referenced in this application are hereby incorporated by reference into the present disclosure to 
more fully describe the sate of the art to which this invention pertains. 

EXAMPLES 

20 EXAMPLE 1 : LOCALIZATION OF THE OLFACTORY RECEPTOR GENE OLF3 
AND OLFS ON THE HUMAN CHROMOSOMES. 

Metaphase chromosome preparation 

Metaphase chromosomes were prepared from phytohemagglutinin (PHA)-stimulated blood 
cell donors. PHA stimulated lymphocytes from healthy males were cultured for 72 h in RPMI-1640 

25 medium. For synchronization, methotrexate (10 juM) was added for 17 h, followed by addition of 5- 
bromodeoxyuridine (5-BrdU, 0.1 mM) for 6 h. Colcemid (1 mg/ml) was added for the last 15 min 
before harvesting the cells. Cells were collected, washed in RPMI, incubated with a hypotonic 
solution of KC1 (75 mM) at 37°C for 15 min and fixed in three changes of methanol:acid acetic 
(3:1). The cell suspension was dropped onto a glass slide, air-dried and kept in darkness at-20°C 

30 until use. 

Probes: 

- The BAC H0526H04 containing OIB and OlfS genes was used to generate probe by Alu- 
PCR. PCR amplification of BAC recombinant DNA (50 ng) was carried out as described by Romana 
et al.(1993). 



BNSDOCID: <WO 0021985A2 l > 



WO 00/21985 PCT/IB99/01729 

79 

- Two DNA fragments carrying respectively OlD and OlfS sequences were generated by 
long range PCR with specific primers (SEQ ID 96-99) and used as probes to confirm the localization 
of each genes. OlD and OlfS amplicons are respectively 2.8 kb and 3.2 kb fragments. 

Probes were labeled by nick translation with bio-16-dUTP (Boehringer Mannheim), and 
5 purified over a Sephadex G50 column. 

Fluorescence In Situ Hybridization 

To determine the chromosomal localization of both genes, the BAC probe was initially 
hybridized to human metaphase cells. When biotinylated PCR products of BAC DNA were used in 
hybridization experiment, 75 ng of probe was precipitated with 75 jig of competitor DNA (human 

10 Cotl DNA, GIBCO-BRL) and resuspended in 10 ^il of hybridization buffer (50% formamide, 2 X 
SSC, 10% dextran sulfate, 1 mg/ml sonicated herring DNA, pH 7). When long range PCR products 
of OIG or OlfS genes were used as probe, 5 ng of biotinylated probe were mixed with 5 fig of 
human Cotl DNA. Prior to hybridization, the probe was denatured at 70°C for 1 0 min and 
preannealed at 37°C for 2 h. 

15 Slides were treated for 1 h at 37°C with Rnase A (100 ^ig/ml), rinsed three times in 2 X SSC 

and dehydrated in an ethanol serie. Chromosome preparations were denatured in 70% formamide, 2 
X SSC (pH 7), for 2 min at 70°C, then dehydrated at 4°C. The slides were treated with proteinase K 
(10 ^ig/ml in 20 mM Tris-HCl, 2 mM CaC12) at 37°C for 8-10 min and dehydrated. After 
preannealing, the hybridization mixture containing the probe was placed on the slide, covered with a 

20 coverslip, sealed with rubber cement and incubated overnight in a humid chamber at 37°C. After 
hybridization and post hybridization washes, the biotinylated probe was detected by avidin-FITC (5 
jug/ml, Vector Laboratories) and amplified once with additional layers of biotinylated goat anti- 
avidin (5 |ag/ml, Vector Laboratories) and avidin-FITC. For chromosomal localization, fluorescent 
R-Bands were obtained as described by Cherif et al. (1990). The slides were observed under a 

25 LEICA fluorescent microscope (DMRXA). Chromosomes were counterstained with propidium 
iodide and the fluorescent signal of the probe appeared as two symmetrical yellow-green spots on 
both chromatids of the fluorescent R-band chromosome. 

Localization 

A specific signal (a double yellow-green spot) was observed on band 1 Iql2-ql3 on at least 
30 on chromosome 1 1 in >80% of the metaphases with all the probes, 

EXAMPLE 2 : IDENTIFICATION OF BIALLELIC MARKERS: DNA 
EXTRACTION 

Donors were unrelated and healthy. They presented a sufficient diversity for being representative of a 
French heterogeneous population. The DNA from 100 individuals was extracted and tested for the 
35 detection of the biallelic markers. 
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30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. 
Cells (pellet) were collected after centrifugation for 10 minutes at 2000 rpm. Red cells were lysed by 
a lysis solution (50 ml final volume : 10 mM Tris pH7.6; 5 mM MgCl 2 ; 10 mM NaCl). The solution 
was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the residual red 
5 cells present in the supernatant, after resuspension of the pellet in the lysis solution. 

The pellet of white cells was lysed overnight at 42°C with 3.7 ml of lysis solution composed 

of: 

- 3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM) /NaCl 0.4 M 
-200 nl SDS 10% 

10 - 500 \x\ K-proteinase (2 mg K-proteinase in TE 1 0-2 / NaCl 0.4 M). 

For the extraction of proteins, 1 ml saturated NaCl (6M) (1/3.5 v/v) was added. After 
vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 

For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous 
supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was 
15 rinsed three times with 70% ethanol to eliminate salts, and centrifuged for 20 minutes at 2000 rpm. 
The pellet was dried at 37°C, and resuspended in 1 ml TE 1 0-1 or 1 ml water. The DNA 
concentration was evaluated by measuring the OD at 260 nm ( 1 unit OD = 50 ng/ml DNA). 

To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio was 
determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1 .8 and 2 were used 
20 in the subsequent examples described below. 

The pool was constituted by mixing equivalent quantities of DNA from each individual. 

EXAMPLE 3 : IDENTIFICATION OF BIALLELIC MARKERS: AMPLIFICATION 
OF GENOMIC DNA BY PCR 

The amplification of specific genomic sequences of the DNA samples of example 2 was 
25 carried out on the pool of DNA obtained previously. In addition, 50 individual samples were 



similarly amplified. 

PCR assays were performed using the following protocol: 

Final volume 25 jil 

DNA 2 ng/yi 

30 MgCl 2 2 mM 

dNTP (each) 200 \iM 

primer (each) 2.9 ng/jd 

Ampli Taq Gold DNA polymerase 0.05 unit/ul 

PCR buffer (lOx = 0.1M TrisHCI pH8.3 0.5M KC1 lx 



35 Each pair of first primers was designed using the sequence information of the olfactory 

receptor gene cluster disclosed herein and the OSP software (Hillier & Green, 1991). This first pair 
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of primers was about 20 nucleotides in length and had the sequences disclosed in Table 1 in the 
columns labeled PU and RP. 



Table 1 



Amplicon 


Position range of 
the amplicon in 
SEQ ID 1 


Primer 
name 
RP 


Position range of 

amplification 
primer in SEQ ID 
Nol 


Primer 
name 
PU 


Complementary 
position range of 

amplification 
primer in SEQ ID 
Nol 


99-13670 


7362 


7824 


Bl 


7362 


7380 


CI 


7805 


7824 


99-13669 


8120 


8662 


B2 


8120 


8140 


C2 


8643 


8662 


99-13666 


14308 


14757 


B3 


14308 


14328 


C3 


14740 


14757 


99-13664 


19346 


19845 


B4 


19346 


19366 


C4 


19826 


19845 


99-13663 


20298 


20800 


B5 


20298 


20318 


C5 


20781 


20800 


99-13660 


76752 


77223 


B6 


76752 


76772 


C6 


77205 


77223 


99-13652 


90967 


91494 


B7 


90967 


90987 


C7 


91474 


91494 


99-13671 


133925 


134393 


B8 


133925 


133945 


C8 


134375 


134393 


99-13649 


139807 


140351 


B9 


139807 


139826 


C9 


140331 


140351 


99-13648 


140912 


141434 


B10 


140912 


140932 


C10 


141416 


141434 


99-13647 


143828 


144309 


Bll 


143828 


143847 


Cll 


144292 


144309 



5 Preferably, the primers contained a common oligonucleotide tail upstream of the specific 

bases targeted for amplification which was useful for sequencing. 

Primers PU contain the following additional PU 5' sequence : 
TGTAAAACGACGGCCAGT; primers RP contain the following RP 5' sequence : 
CAGGAAACAGCTATGACC. The primer containing the additional PU 5' sequence is listed in 
10 SEQ ID No 26. The primer containing the additional RP 5' sequence is listed in SEQ ID No 27. 

The synthesis of these primers was performed following the phosphoramidite method, on a 
GENSET UFPS 24. 1 synthesizer. 

DNA amplification was performed on a Genius II thermocycier. After heating at 95°C for 10 
min, 40 cycles were performed. Each cycle comprised: 30 sec at 95°C, 54°C for 1 min, and 30 sec at 
15 72°C. For final elongation, 10 min at 72°C ended the amplification. The quantities of the 

amplification products obtained were determined on 96-well microliter plates, using a fluorometer 
and Picogreen as intercalant agent (Molecular Probes). 

EXAMPLE 4 : IDENTIFICATION OF BIALLELIC MARKERS: SEQUENCING OF 
AMPLIFIED GENOMIC DNA AND IDENTIFICATION OF POLYMORPHISMS. 

20 The sequencing of the amplified DNA obtained in example 3 was carried out on ABI 377 

sequencers. The sequences of the amplification products were determined using automated dideoxy 
terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of 
the sequencing reactions were run on sequencing gels and the sequences were determined using gel 
image analysis (ABI Prism DNA Sequencing Analysis software (2.1 2 version)). 
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The sequence data were further evaluated using the above mentioned polymorphism analysis 
software designed to detect the presence of biallelic markers among the pooled amplified fragments. 
The polymorphism search was based on the presence of superimposed peaks in the electrophoresis 
pattern resulting from different bases occurring at the same position as described previously. 
5 1 1 fragments of amplification were analyzed. In these segments, 13 biallelic markers 

referred to as Al to A13 in the BM column were detected. The localization of these biallelic markers 
is as shown in Table 2, 



Table 2 



Amplicon 


BM 


Marker Name 


Localization in OLF 
gene cluster 


Polymor- 
phism 


BM position in 
SEQ ID No 1 


99-13670 


Al 


99-13670-305 


Between Orfl and Orf2 


A/C 


7521 


99-13669 


A2 


99-13669-471 


Between Orfl and Orf2 


A/C 


8192 


99-13666 


A3 


99-13666-275 


Between Orf2 and Orf3 


A/T 


14483 


99-13664 


A4 


99-13664-221 


Between Orf2 and Orf3 


A/G 


19625 


99-13663 


A5 


99-13663-218 


Between Orf2 and OrO 


C/T 


20583 


99-13660 


A6 


99-13660-277 


Between Orf4 and Orf5 


G/T 


76947 


99-13652 


A7 


99-13652-407 


Between Orf5 and Orf6 


G/C 


91088 


99-13652 


A8 


99-13652-357 


Between Orf5 and Orf6 


C/T 


91138 


99-13652 


A9 


99-13652-308 


Between OrfS and Orf6 


C/T 


91187 


99-13671 


A10 


99-13671-396 


Between Orf9 and 
OrflO 


C/T 


133998 


99-13649 


All 


99-13649-286 


Between Orf9 and 
OrflO 


A/G 


140066 


99-13648 


A12 


99-13648-259 


Between Orf9 and 
OrflO 


C/T 


141176 


99-13647 


A13 


99-13647-278 


After OrflO 


C/T 


144033 



10 Table 3 



BM 


Marker Name 


Position range of 
probes in SEQ ID 
Nol 


Probes 


Al 


99-13670-305 


7498 


7544 


PI 


A2 


99-13669-471 


8169 


8215 


P2 


A3 


99-13666-275 


14460 


14506 


P3 


A4 


99-13664-221 


19602 


19648 


P4 


A5 


99-13663-218 


20560 


20606 


P5 


A6 


99-13660-277 


76924 


76970 


P6 


A7 


99-13652-407 


91065 


91111 


P7 


A8 


99-13652-357 


91115 


91161 


P8 


A9 


99-13652-308 


91164 


91210 


P9 


A10 


99-13671-396 


133975 


134021 


P10 


All 


99-13649-286 


140043 


140089 


Pll 


A12 


99-13648-259 


141153 


141199 


P12 


A13 


99-13647-278 


144010 


144056 


P13 
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EXAMPLE 5 : VALIDATION OF THE POLYMORPHISMS THROUGH 
MICROSEQUENCING 

The biallelic markers identified in example 4 were further confirmed and their respective 
frequencies were determined through microsequencing. Microsequencing was carried out for each 
5 individual DNA sample described in Example 2. 

Amplification irom genomic DNA of individuals was performed by PCR as described above 
for the detection of the biallelic markers with the same set of PCR primers (Table 1). 

The preferred primers used in microsequencing were about 19 nucleotides in length and 
hybridized just upstream of the considered polymorphic base. According to the invention, the 
10 primers used in microsequencing are detailed in Table 4. 

Table 4 



Marker Name 


BM 


Mis. 1 


Position range of 
microsequencing 
primer mis 1 in 
SEQIDNol 


Mis. 2 


Complementary position 

range of 
microsequencing primer 
mis. 2 in SEQ ID No 1 


99-13670-305 


Al 


Dl 


7502 


7520 


El 


7522 


7540 


99-13669-471 


A2 


D2 


8173 


8191 


E2 


8193 


8211 


99-13666-275 


A3 


D3 


14464 


14482 


E3 


14484 


14502 


99-13664-221 


A4 


D4 


19606 


19624 


E4 


19626 


19644 


99-13663-218 


A5 


D5 


20564 


20582 


E5 


20584 


20602 


99-13660-277 


A6 


D6 


76928 


76946 


E6 


76948 


76966 


99-13652-407 


A7 


D7 


91069 


91087 


E7 


91089 


91107 


99-13652-357 


A8 


D8 


91119 


91137 


E8 


91139 


91157 


99-13652-308 


A9 


D9 


91168 


91186 


E9 


91188 


91206 


99-13671-396 


A10 


D10 


133979 


133997 


E10 


133999 


134017 


99-13649-286 


All 


Dll 


140047 


140065 


Ell 


140067 


140085 


99-13648-259 


A12 


D12 


141157 


141175 


E12 


141177 


141195 


99-13647-278 


A13 


D13 


144014 


144032 


E13 


144034 


144052 



Mis 1 and Mis 2 respectively refer to microsequencing primers which hybridized with the 
non-coding strand of the olfactory receptor gene or with the coding strand of the olfactory receptor 
15 gene. 

The microsequencing reaction was performed as follows : 

After purification of the amplification products, the microsequencing reaction mixture was 
prepared by adding, in a 20\i\ final volume: 10 pmol microsequencing oligonucleotide, 1 U 
Thermosequenase (Amersham E79000G), 1.25 fil Thermosequenase buffer (260 mM Tris HC1 pH 

20 9.5, 65 mM MgCl 2 ), and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye Terminator Set 
401095) complementary to the nucleotides at the polymorphic site of each biallelic marker tested, 
following the manufacturer's recommendations. After 4 minutes at 94°C, 20 PCR cycles of 15 sec at 
55°C, 5 sec at 72°C, and 10 sec at 94°C were carried out in a Tetrad PTC-225 thermocycler (MJ 
Research). The unincorporated dye terminators were then removed by ethanol precipitation. Samples 

25 were finally resuspended in formamide-EDTA loading buffer and heated for 2 min at 95°C before 
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being loaded on a polyacrylamide sequencing gel. The data were collected by an ABI PRISM 377 
DNA sequencer and processed using the GENESCAN software (Perkin Elmer). 

Following gel analysis, data were automatically processed with software that allows the 
determination of the alleles of biallelic markers present in each amplified fragment. 
5 The software evaluates such factors as whether the intensities of the signals resulting from 

the above microsequencing procedures are weak, normal, or saturated, or whether the signals are 
ambiguous. In addition, the software identifies significant peaks (according to shape and height 
criteria). Among the significant peaks, peaks corresponding to the targeted site are identified based 
on their position. When two significant peaks are detected for the same position, each sample is 
10 categorized classification as homozygous or heterozygous type based on the height ratio. 

While the preferred embodiment of the invention has been illustrated and described, it will 
be appreciated that various changes can be made therein by the one skilled in the art without 
departing from the spirit and scope of the invention. 
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SEQUENCE LISTING FREE TEXT 

The following free text appears in the accompanying Sequence Listing : 

open reading frame 

ubiquitin 1 pseudogene complement 

ubiquitin 2 pseudogene complement 

polymorphic base 

or 

complement 
probe 

sequencing oligonucleotide PrimerPU 
sequencing oligonucleotide PrimerRP 



WO 00/21985 PCT/IB99/01729 

88 

What is claimed: 

1 . An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of SEQ ID No 1 or the complements thereof, wherein said contiguous span 
comprises at least 1 of the following nucleotide positions of SEQ ID No 1: 1-1 13643, 1 14064- 
5 127488,127855-144460. 



2. An isolated, purified, or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of a sequence selected from the group consisting of SEQ ID Nos 2-1 1 or the 
complements thereof. 

10 

3. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of 8 to 50 nucleotides of SEQ ID No 1 or the complement thereof, wherein said 
span includes an olfactory receptor-related biallelic marker in said sequence. 

15 4. A polynucleotide according to claim 3, wherein said olfactory receptor-related biallelic 

marker is selected from the group consisting of Al to Al 3, and the complements thereof. 

5. A polynucleotide according to claims 3 or 4, wherein said contiguous span is 18 to 47 
nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 

20 polynucleotide. 

6. A polynucleotide according to claim 5, wherein said polynucleotide consists essentially 
of a sequence selected from the following sequences: PI to PI 3, and the complementary sequences 
thereto. 

25 

7. A polynucleotide according to any one of claims 1, 2 or 3, wherein the 3* end of said 
contiguous span is present at the 3 1 end of said polynucleotide. 

8. A polynucleotide according to claims 3 or 4, wherein the 3' end of said contiguous span is 
30 located at the 3' end of said polynucleotide and said biallelic marker is present at the 3* end of said 

polynucleotide. 



9. An isolated, purified, or recombinant polynucleotide consisting essentially of a contiguous 
span of 8 to 50 nucleotides of SEQ ID No 1 or the complement thereof, wherein the 3 1 end of said 
35 contiguous span is located at the 3' end of said polynucleotide, and wherein the 3' end of said 
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polynucleotide is located within 20 nucleotides upstream of an olfactory receptor-related biallelic 
marker in said sequence. 

10. A polynucleotide according to claim 9, wherein the 3* end of said polynucleotide is 

5 located 1 nucleotide upstream of said olfactory receptor-related biallelic marker in said sequence. 

1 1. A polynucleotide according to claim 10, wherein said polynucleotide consists 
essentially of a sequence selected from the following sequences: Dl to D13, and El to El 3. 

10 12. A polynucleotide according to claim 7 consisting essentially of a sequence selected from 

the following sequences: Bl to Bl 1 and CI to CI 1. 

13. An isolated, purified, or recombinant polynucleotide which encodes a polypeptide 
comprising a contiguous span of at least 6 amino acids of a sequence selected from the group 

1 5 consisting of SEQ ID Nos 12-21. 

14. A polynucleotide for use in a genotyping assay for determining the identity of the 
nucleotide at an olfactory receptor-related biallelic marker or the complement thereof. 

20 15. A polynucleotide according to claim 14, wherein the polynucleotide is used in an assay 

selected from the group consisting of: a hybridization assay, a sequencing assay, an enzyme-based 
mismatch detection assay, and an amplification of a segment of nucleotides comprising said biallelic 
marker. 

25 16. A polynucleotide according to any one of claims 1-15 attached to a solid support. 

1 7. An array of polynucleotides comprising at least one polynucleotide according to claim 

16. 

30 18. An array according to claim 17, wherein said array is addressable. 

19. A polynucleotide according to any one of claims 1-15, further comprising a label. 

20. A recombinant vector comprising a polynucleotide according to any one of claims 1-15. 

35 

21. A host cell comprising a recombinant vector according to claim 20. 



BNSDOCIO: <WO 00219B5A2 1 > 



WO 00/21985 PCT/IB99/01729 

90 

22. A non-human host animal or mammal comprising a recombinant vector according to 
claim 20. 

23. A mammalian host cell comprising an olfactory receptor gene disrupted by homologous 
5 recombination with a knock out vector, comprising a polynucleotide according to any one of claims 

1-15. 

24. A non-human host mammal comprising an olfactory receptor gene disrupted by 
homologous recombination with a knock out vector, comprising a polynucleotide according to any 

10 one of claims 1-15. 

25. An isolated, purified, or recombinant polypeptide comprising a contiguous span of at 
least 6 amino acids of a sequence selected from the group consisting of SEQ ID Nos 12-21. 

15 26. An isolated or purified antibody composition are capable of selectively binding to an 

epitope-containing fragment of a polypeptide according to claim 25. 

27. A method of genotyping comprising determining the identity of a nucleotide at an 
olfactory receptor-related biallelic marker or the complement thereof in a biological sample. 

20 

28. A method according to claim 27, wherein said biological sample is derived from a 
single subject. 

29. A method according to claim 28, wherein the identity of the nucleotides at said biallelic 
25 marker is determined for both copies of said biallelic marker present in said individual's genome. 

30. A method according to claim 27, wherein said biological sample is derived from 
multiple subjects. 

30 3 1 . A method according to claim 27, further comprising amplifying a portion of said 

sequence comprising the biallelic marker prior to said determining step. 

32. A method according to claim 31, wherein said amplifying step is performed by PCR. 

35 33. A method according to claim 27, wherein said determining is performed by an assay 

selected from the group consisting of: a hybridization assay, a sequencing assay, a microsequencing 
assay, and an enzyme-based mismatch detection assay. 
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34. A method according to claim 27 wherein said olfactory receptor-related biallelic marker 
is selected from the group consisting of Al to A13 and the complements thereof. 

5 35. A method for the screening of a candidate substance interacting with an olfactory 

receptor polypeptide selected from the group consisting of SEQ ID Nos 12-21, or fragments or 
variants thereof, comprises the following steps : 

a) providing a polypeptide selected from the group consisting of the sequences of SEQ ID 
Nos 12-21 , or a peptide fragment or a variant thereof; 
1 0 b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; and 

d) detecting the complexes formed between said polypeptide and said candidate substance. 

36, A method for the screening of ligand molecules interacting with an olfactory receptor 
15 polypeptide selected from the group consisting of SEQ ID Nos 12-21, wherein said method 
comprises : 

a) providing a recombinant eukaryotic host cell containing a nucleic acid encoding a 
polypeptide selected from the group consisting of the polypeptides comprising the amino acid 
sequences SEQ ID Nos 12-21; 
20 b) preparing membrane extracts of said recombinant eukaryotic host cell; 

c) bringing into contact the membrane extracts prepared at step b) with a selected ligand 
molecule; and 

d) detecting the production level of second messengers metabolites. 

25 37. A method for the screening of ligand molecules interacting with an olfactory receptor 

polypeptide selected from the group consisting of SEQ ID Nos 12-21, wherein said method 
comprises : 

a) providing an adenovirus containing a nucleic acid encoding a polypeptide selected from 
the group consisting of the polypeptides comprising the amino acid sequences SEQ ID Nos 12-21; 
30 b) infecting an olfactory epithelium with said adenovirus; 

c) bringing into contact the olfactory epithelium b) with a selected ligand molecule; and 

d) detecting the increase of the response to said ligand molecule. 
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FIGURE 1 



1/2 



TM1 



listl.msf {orf ~8 
listl.msf {orf-9 
listl.msf {orf-7 
listl.msf {orf -2 
listl.msf {orf -4 
listl.msf {orf-5 
listl .msf {orf -6 
listl.msf {orf-10 
listl.msf {orf -3 
listl.msf {orf -l 
Consensus 



-MRRNFTLVT EFILLGLTNH QELQILLFML FLAIYMVTVA 
-MRRNCTLVT EFILLGLTSR RELQILLFTL FLAIYMVTVA 



MFSPNHTIVT 
MLSPNHTIVT 
MSNTNGSAIT 
MVRGNSTLVT 
MSRRNYTELT 
MLKKNHTAVT 



EFILLGLTED 
EFILLGLTDD 
EFILLGLTEC 
EFILLGLKDL 
EFVLLGLTSR 
EFVLLGLTDR 



0 



PVLEKILFGV 
PVLEKILFGV 
PELQSLLFVL 
PELQPILFVL 
PELRVAFLAL 
AELQSLLFW 



FLAIYLITLA 
FLAIYLITLA 
FLWYLVTLL 
FLLIYLITVG 
FLFVYIATW 
FLVIYLITVI 



M-R-N-T-VT EFILLGLTD - PELQ-LLF-L FLAIYLITVA 



GNLSMIALI 3 
GNLGMIVLI 3 

LPSS R 

GNLCMILLI R 
GNLCMIIiLI R 
GNLGMIMLMR 
GNLGMLVLIR 
GNLGMIILIK 
GNVSMILLI R 

MS FLI R 

GNLGMI-LIR 



TM2 



listl.msf {orf -8 
listl -msf {orf -9 
listl.msf {orf -7 
listl.msf {orf -2 
listl .msf {orf -4 
listl .msf {orf -5 
listl.msf {orf -6 
listl .msf {orf -10 
listl.msf {orf -3 
listl .msf {orf -1 
Consensus 



51 . 

ANARLHTPMY 
ANAWLH VTPMY 
PTPRLH TPMY 
TNSHLQ TPMY 
TWSQLQ TPMY 
LDSRLH TPMY 
IDSRLH TPMY 
VDSRLH TPM- 
SDSTLH TPMY 
SDSTLH TPMC 
-DSRLH TPMY 



FFLSHLSFLD 
FFLSHLSFVD 
FFLSNLSFVD 
FFLGHLSFVD 
FFLGHLSFLD 
FFLTNLAFVD 
FFLASLSCLD 



LCFSSNVTPK 
LCFSSNVTFK 
LCFSSNVTPR 
ICYSSNVTFN 
ICYSSNVT^N 
LCYTSNATEQ 
LYYSTNVTPK 



FFLSHLSFVD LCYTTNVTFQ 
LFLSHLSFVD LYYATOATFP 
FFLSHLSFVD LCYSSNVTP - 



MLEIFLSEKK 
MLEIFLSEKK 
MLEIFLSEKK 
MLHNFLSEQK 
MLHNFLSEQK 
MSTOIVSE . K 
MLVNFFSDKK 



100 

SISYPACLVQ 
SISYPACLVQ 
SISYPARLVQ 
T I S YAGCFTQ 
TISYAGCFTQ 
TISFAGCFTQ 
AISYAACLVQ 



MLVNFLSKRK TISFIGCFIQ 
MLVNFFFPRE KPFPLLVALS 
ML-NFLSEKK TISYA-C-VQ 
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list 
list 
list 
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list 
list 
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1 .msf {orf -9 
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101 

CYLYIILVHV 
CYLF I ALVHV 
CYLF ITLVHV 
CLLFIALVIT 
CLLFIALVIT 
CYIFIALLLT 
CYFFIAWIT 



150 



EIYILAVMAF 
EIYILAVMAF 
ELY I LAVMAF 
EFYFLASMAL 
EFYFLASMA1 
EFYMLAAMAY 
EYYMLAVMAY 



FHFFIALVIT DYYMLTVMAY 

NFTFSLHW — 

CYLFIALVIT E-Y-LAVMA- 



DRYMAICNPL 
DRYMAICKPL 
DRYVAICSPL 
DRYVAICSPX 
DRYVAIYDPL 
DRYVAICNPL 



LYGSRMSKEV 
LYGSRMSKSV 
HYSSRMSKNI 
HYSSRMSKNI 
RYSVKTSRRV 
LYSSKMSKGL 



DRYMAICKPL LYGSKMTRCV CLCLAAAPYI 



DRYVAIC-PL LYSSRMSK- 



CSFLITVPYV 
CSFLITVLYV 
CISLVTVPYM 
CISLVTVPYM 
CICLATFPYV 
CIRLIAGPYV 



V CI-L-TVPYV 
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151 



TM5 

200 



YGALTGLMET 
YGALTGLMET 
YGFLNGLSQT 
YGXLNGLSQT 
YGFSDGLFQA 
YGFLSGLMET 



YGFANGLSTD HPDASSVLLW TQ 



MWT ifNLAFCG 
MWT!irNLAFCG 
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tatatctttg tgagtcctac gttaaaccca ttgatctacc gtctgaggaa taaaaatgtt 3420 

aaaagaacaa taagggaagt tatccaaaag aaactgtttg ctaagtaagg tagatatttt 3480 

agttgcaggt tatgtaatac attattttta tcttaccaat taacgagcat tataaattaa 3540 

caaatcactt tctgtcattg agtgtttttt gtcttttgta acttgcatat gggaattgaa 3600 

agtgtatacc aaattattag ctagagttga cagtgtcatc tcagtgaatt taagaagaaa 3660 

tcatagaaat ttaaatagaa gacttatggc atgtaaaagt caataaagaa cagtgattcc 3720 

ttctttagta ctcatattgg tagcaaacga taaaagacag aatgcaatgg aaattacagt 3780 

tcattacatt tttatagtac ttaataactt ccaaactatt ttctagacac ctttcaaaca 3840 

tagtatatga agttttctcc tttcttttat acagataatg caacaataaa gatcactgat 3900 

gtagggaaaa gagagatcag actgttactg tgtctatgta gaaaaggaag gcataagaaa 3960 

cttcattttg acttgtaccc tgaacaattg ttttgtcctg agatgctgtt aatctgtaac 4020 

tttgccccaa ccttgagctt ataaaaacat gtgttgtatg gaatcaaggt ttaagggatc 4080 
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gtcatcgcca ttctccagtc tccataaacc aggggcacaa tgcactgtgg aaagtcacag 4200 

ggacctctgc cctggaaagc cgggtattgc caaggtttct ccccatgtga tagtctgaaa 4260 

tatggcctcg tgggatggga aagacctgac cggcccccag cctgacaccc gtgaagggtc 432 0 

tgtgctcggg aggattagta aaagaggatg gcctcttata gctgagataa gaggaaggcc 4380 

tctgtctcct gcctgcccct gggaactgaa tgtctcggta taaaacctga ttgcaccttt 4440 

cttctattct gagataggag aaaaaccgcc ctgtggcggg aggcgagaca tgttggcagc 4500 

aatgctgctt tgttattctt tactccactg agatgtttgg gtggagagaa ggaaaaatct 4560 

ggcttacgtg cacatccagg catagtacct ccccttgaac ttatttgtga cacagattcc 4620 

tttgctcaca tgttttcttg ctgaccttct, ccctattatc accctgttct cctaccacat 4680 

tcctcttgct gagatagtga aaatagtaat caataaaaac tgagggatct cagagaccgg 474 0 

tgccgatgca ggtcttccat atgctgagcg ccagtcccct ggggccactg ttctttctct 4800 

atactttgtc tctgtgtctt atttcttttc tcagtctctc atcccacctg acgagatata 4860 

cccacaggtg cagaggggca ggccacccct tcaattgaag tatatctcag aatactactt 4 920 

tgagatacag tcttagaatt atattttgag ccaatgaaat cttctttctt gaagcttttg 4980 

aagcaatgcc aaatttccgt tagtaggctt tataaatatc attgtttgca ttaccaggag 5040 

gcattcacat caatatgtga cctcacttct ccactctttc attgccattg aagcagatac 5100 

tttcaagtat gtcttaatat attgattttt atcttctcat tgggggaaca tgggaagtgt 5160 

cacatgtggg actacaccgt aatttgggta tttgtagtct taaggttttc atgaagcttc 5220 

gtgtgggcct ccatttctct agaacgattt gatgtgttcg ttttttatcc ttcacagcaa 5280 

cacatgctta ggcagatgaa tcactgcagc agcatttaga cacatttgtg attcagggat 5340 

agatagctct tcagtaggat ggtgtgaatt ttgggataat ggcacatact taaaacagaa 5400 

ctaccttttg acccagcaat cccattactg ggtatataca ccaaggaata taaatcattc 5460 

caccataaag acacatgcat gtgtatgttc atcacaacag tattcacaat agcaaagaca 5520 

tggaatcaac ctaaatgccc ttcaacagta gattggatac aaaaaaatgt ggtatatata 5580 

catcatggaa tgctatgcag ccgtaaaaaa aaagaatgag attatgtctt ttgtagcaac 5640 

acggatgaag ctggaggaca atatccaaag caaaccaatg caggaacagg aaaccaaatg 5700 

ctgcaagttc tcacttacat gtggaagcta aacattgaat acacatggac acaaaaaagg 5760 

gaacaacagg caccaggacc tacttgaggg tgaagtgtgg gaggagggta aggattgaaa 5 820 

ctctgcctat caggcactgt gcttatcaac tgggtgatga aataacctgt acaccaaaac 5880 

cctgtgacat ggaatttacc tttataacaa agctgcacat ggacccctga acctaaaata 5940 

aaagttaaaa aaagaaatct gtcccaagga gactgttttc tcttaatgtg ctgcatcctg 6000 

cttaatgaac tatggatttc atgcattctt tttcaagatt atattgccta cctgattgta 6060 

gacagtttga tgcattttac atagtatcag ttaaacatta aacataatta agagcatttg 6120 

gcttcaaaat aatagtaaat gggtagaatt tattatggtt atagtactac tcatacaaat 6180 

aataatacaa catcagtgat gtagtgtcta gtgagcatga cactattata gaacacttct 6240 

taggctggat tttgataata atagcatgct ataacttttg aataaaaata gtaaattgaa 6300 

taatcacaaa caagtaaaaa tctaaagggc cagtagtatg tattcaacta gcttaccagt 6360 

gtatcatttg tgtagctaaa tccattcgct gtccctctag cagacacaca tgctagttat 6420 

tgcagtggaa gaaaaatgaa atgaactaag gaattaatgt ctttgagtaa tataaacaga 64 80 

gccttctgga ggttttctat gaaaaataac ataagtatgt gtaaaactct tctttgagta 6540 

agtatcatac acatagctaa attctgtatt ttctattcat tgctgtataa ataaattaac 6600 

attgccttct ttcgtgtgga catctgatca tgtgattcca tacttaaaat tacatatact 6660 

ttgctctttc taaatgaaaa tagcccattc atctatactt ttcatctaac cattcagtta 6720 

ttatagggct agttttattt ttgtcagaaa attcactttg taaatcttat gttttattat 6780 

gtagccattg tataccaaca tataaaagaa aatacataca tacatacaca catacattag 6840 

ggaatttttt cttcaaagtg aaagcagtct acttaaaaat tatccaaatc ttcaacattt 6 900 

ttatgcaaaa caaggtacct ctacattaag ggaggaggaa atgagggaca ccagttttat 6960 

aatcttatga taccttatat tccatcttaa tttttttttt tgagatgaag tctcgctctg 7020 

tcacccaggc tgaggtgcag tggtgtaatt ttggctcatt gcaacctctg cctcctgggt 7080 

tcaagcaatt ctccctgcct cagcctcctg agtagctggg attacaggca cgcaccacca 7140 
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ttttagtaga gacagggttt cgccatgttg gccaggctgg 720 0 
gtgatctagc ggccttggcc tcccaaagtg ctgggattat 7260 
ggcttccaaa atatttaagc aatattattg cattttacaa 7320 
aaaataaaac agaataattg agatagagaa gactttacac 7380 
taggaatgcc ctatattctc aaatttcatg ggatgtaaca 744 0 
tatatagcta tattttatta aatgatttct cagagtataa 7500 
kaaaattagg gatgattctc cgtcttggtc agactgtact 7560 
tggaactata tgtgcactgg aagatacaga ctaatgaacg 7620 
ctccaacaaa tacagtgcta cacaattttc aaatattctc 7680 
tctaaatcaa caattttatg catcatcaat tttataaata 7740 
gatgattagc atgttaatat taaccttgat agtgcagact 7800 
acgcccgtcc tccaaaggat cctgggccct gaccaatact 7860 
atgccactct tctgctctca cagtctttga aatctttcat 7920 
ccacaacttc ctacccctca ctcattagat gacctcactg 7980 
tgaagagttc tgatgagaat tccttgtttt catcatacta 8040 
ccatgttttc ctcttccatg ttattgaaag ggatggctaa 8100 
atctcatgca cacttgtacc gaaactctac ataaatgtaa 8160 
aggagagtct akctccagga agcatcaaat gaaggttatg 8220 
ttttaattgg ctctttagaa ttagtttaaa aatactctag 82 80 
attgtaaatc ttgctacctt gttttcataa gatatggtcc 8340 
cattttctac atttgttcac taaaaataca atttgtgtgt 8400 
gtgtgtgtgt gtgtgtagag ataagtattc catcaatggg 8460 
atttcactag acttttctat ccatatgtta gtattgcatt 8520 
atgtaagctc ttttgatgta tacatgagaa ctaaataaac 85 80 
tttggagtag aaatttttac aatcttatgt tactttcaaa 8640 
ttgagttgga agctaagacc ccttttaaaa attatctgtg 8700 
atgagtgggt tagtggcaca aaaatgggaa caaaaaaaat 8760 
agaattgcat atctgctgtt taaatgctgc attcctagaa 882 0 
tctaaaagca tcctacattc atgttttctt cacagaaaag 8880 
gtaacttttt ctgtgtgttt aatctctgtt ccctctatta 8940 
ctattataat gtgtgctccc tgagctggga gacttgttta 9000 
ggaacagaaa gaaatgaagc tatacagtat acatgagtaa 9060 
gtggaaaaaa acaagtcttc attttgtacc ctcttagcca 912 0 
ttctctagtt taatcgttcc tgaataaagg taaggcacac 9180 
aatgctttgc ttttctttct tcattttgta tctgaaacaa 9240 

tataataaag atcagacttc ctttatttat ccatttgaaa 9300 

tttgtattta attttcattc ctttggattt tttgtttcct 9360 

accaaatgtg gagtacacca agaagacagg tataaatgta 9420 

tgtatacatg tatggcagag agaaatagag aatatgtatg 94 80 

ttgatgtata gaaagataca gattaaaaca gacatatagg 954 0 

ttccgatgtg attattgaaa caagagaagt aattgtcacc 9600 

cgaatgataa atggatgaaa caaatgccaa atctgaatca 9660 

ttgtcacttt cagtttcaag agataagaag atgttctccc 9720 

gaattcattc tcttgggact gacagacgac ccagtgctag 9780 

ttccttgcga tctacctaat cacactggca ggcaacctgt 984 0 

accaattccc acctgcaaac acccatgtat ttcttccttg 9900 

atttgctatt cttccaatgt tactccaaat atgctgcaca 9960 

accatctcct acgctggatg cttcacacag tgtcttctct 10020 

gagttttact tccttgcttc aatggcattg gatcgctatg 10080 

cattacagtt ccaggatgtc caagaacatt tgcatctctc 10140 

tatggcttcc ttaatgggct ctctcagaca ctgctgacct 10200 

tcccttgaaa tcaatcattt ctactgcgct gatcctcctc 10260 

gacacccgtg tcaaaaagat ggcaatgttt gtagttgcag 1032 0 

ctcttcatca ttcttctgtc ctatcttttc atttttgcag 10380 

gctgaaggca ggcacaaagc cttttctacg tgtgcttccc 10440 

ttttatggaa ccctcttctg catgtacgta aggcctccat 10500 

tccaaaataa ctgcagtctt ttatactttt ttgaccccaa 10560 

agcctacgga acacagatgt aatccttgcc atgcaacaaa 10620 

cataaaattg cagtttaggc ttgtgtttat ttgcagtcac 10680 

aactggcttt tgaaatggaa aaacctagtg tagtcgtgat 10740 

tcagtaacca ctttactttc ttatccaaat gaaaaccttg 10800 

aaaagcctta atgttgagaa atttaaaatg ttttatttgt 10860 

attttttagt atctaataat tctatatgaa aatactatgt 10920 
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taaattttta actaacttag cctgagagaa attctgaaaa 10980 

tggtccacag agccataatg aggttcaaga tagggtggat 1104 0 

atgaaattat ttctacttca atatggtgtc ttgaacaata 11100 

attgtaatca atgtgcaaaa aatataaata tatgggtcac 11160 

aagtgttccc caaacccatc ttgtaaatat cttgagaagc 11220 

aatactggtt atcgtccttg attctgaaaa tcaactctca 11280 

aagtgtatgg atccaattct atcatcaaac tactttaact 1134 0 

agtgccgtac ctctgactta ggactaattc agtaggagtc 114 00 

ctgtgcatac ccataaatgt cttgaaacag tatacaaata 114 60 

tctggaggaa atactcagtg tttattttgc ttcatgaagc 1152 0 

taaaataagg gtgttctagt agttcttgaa gctgctatag 11580 

gctacctaag gagacacatt aatccatgca ttattagtca 11640 

tacatatgta tttagcccta ataaattgcc tcatatgttt 11700 

tctgatatta actaacacca tacagaatat gctacagaaa 11760 

actatatcta ctataaagtg ttcaaatggt tttccagaat 11820 

tttttgtaaa atgggggcat attcttgagt ctcatttgat 11880 

tcctctttca gaaacagtat ctgaaaaaga aggaaggcaa 1194 0 

acaattacat tttagcactg tttgtttttg tatttgcttg 12000 

acaacagagg aaggtcatct gcagagacaa caatgaagca 12060 

acacagagtc ctcacttcct ttttgagggc cagcatcatt 12120 

tattttatga gaaatttcca tctattaggc aaagcattgt 12180 

aatggtctca tatccagatt tgaaacccta ttfegaaaatt 1224 0 

aaaatactct atatttggcc taaataactg tacgattcca 12300 

attgctctgg cttgcagcaa ccatttttaa actttgacat 12360 

acctgtttct aattgatttt taataaaata ttttatttat 12420 

tttccttctg catgtctttg ggtaaaagtc acctcagaaa 12480 

acataaatcg acatgaataa aagtaattat aagagttgca 1254 0 

ttaggaacac tttattttaa tcttcataaa tgaaataatg 1260 0 

tacaacattg gagcctgagg cacggagaga ttaaatttct 12660 

ctcagtagga caaaagcagg tttcttggtt tcaaattctc 1272 0 

tagttgttat taaagaaatt ctcaaaacac tgcaaatgct 127 8 0 

tgaattgatg agattattgc tatagatgaa atagtggtta 1284 0 

tcctaagaat agctaatgag aatttccctt gaaaattccg 12900 

ttataattaa aattgtctta acgtccatgt gaagggagaa 12 960 

atatatgcta tttaggctgc attttattaa tatattttcc 13020 

ttcatattaa ttaattacca ggtacagaca aatatgaggg 13 080 

tgtcctgtga caaaggatac attttcttta tattttgtta 13140 

atatgcgtac atgttactaa aatgcatcaa gcaaatgtga 132 00 

tttgaacact gattttcact tgatataatg tatgcacctt 13260 

gatgaacatc atgttgaatg tcatataata gttagatcat 1332 0 

atacatatca attgtagaaa actgactgaa ccatattgac 133 80 

acatttactt ataatcacct acctgaagag agctgaactc 13440 

tccatttgat tatcttgcaa accatgattc catcatatca 13500 

gactgatgta tctgtgcatc tctaattgtt ccttttgtgt 13560 

gcfcgaattaa atgttttcaa tgcttttgaa actcctgcta 13 620 

tttctccaga aagcttctag caacttaaac ctgtactaga 13680 

tcactataac tctatcagta gtaggagctc ttgtttatgg 13 74 0 

tcaagggaag tcttagaaaa taattctgtt ctaataagca 13 800 

atattgaaca aatgaaggct acacataagg ttgataattt 13 860 

aatttcttgc atgaataatg tttgtatttc tatttggaaa 13920 

gctgttatgt aagttttcct catcctaaaa ttgaaaaaac 13980 

atatttgata gtgatatcag gaattatttt agatataata 14040 

atattaaaaa ttatcttttc cctcttattt ttaaccaatt 14100 

caatgctttg tttttcaact gatttgtgat gtcaccttta 14160 

tttctgccta tttcctcttt atgtttggcc acaatataca 1422 0 

tatgattagt gtacacttaa aatatgtttt aatttctgat 14280 

tttctcttag gagactgcaa atcatctgat gattgctttt 14340 

ccataaatta tgcatactat tgttcaattt agaaactata 14400 

atatatattt caatgtggct atacatgaaa tacttaaaca 14460 

aawgaaaact attttcctct ctatgggggc cataggaaaa 14520 

tttccagaaa ggcaggaaaa cgttggtggc ctgtgagtaa 14580 

gtatattaat tctatcgatt cattaacaac tttatacaaa 14640 

cggcgttcat aataaaacac tttttttcaa caaatatctc 14700 
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taaatacaga ttcttttctc taactatgct gactttttgc caacaactcc acccaaaaga 14760 

gttgacatgc ccacatatat tcttctgcta ggaatcattt caaacactga aaaaacccta 14820 

ttaattagaa ttaaaataaa tattaaacat ttaaataatc agagaaaaat tactctgtgc 14880 

ttcctaagaa tgggaatact tcacagtaat actgttgctt taatttgaca ataaaactat 14940 

ttaaaaataa gacccaccct acagtctggc aatgtatttg tcacaaataa gacaaagtgg 15000 

cacaattact gaaatatgtt atgttaatat aaatcatttg aagcactaaa acatgaccat 15060 

ataaataaat gttgaatact atacgtagaa aattgattaa aatacaaaat gctaaaatta 15120 

acgcagaaaa tagtggcagc taaaaacatg aagtacaata ctgtttatgt gtctatattt 15180 

aagacagctg aaaaagtcta gcacgttcat acattggttg attaatggta taaaattttt 15240 

ctagaaataa atacaatatt attaatattt taaggagtca aatgtttcta taattttatt 15300 

cagctattcc aaatttaaga atctactgaa ggcataatta gaaacttgtg tgaagagttt 15360 

catggtgttc attttataat caagaaaaag caaacaaatg atgttcgaaa tacaagtagc 15420 

tcccacttag caccgtacca tgtaacagaa acctttctat atggcaaacg tgtcttgttt 15480 

tgcatggttc ttacttagat attacaaaat ctcaaagtga ggactgcctg ctttggataa 15540 

ctggtaaagc acaggcatct accagatggg gtagaaggtg tttctaaata actagaatgt 15600 

catcagcctt gatcatgaag gaaagtgtaa agtacagttt aatttttttt tcttagaatg 15660 

ttatacagct attaaagtga ggatgttcct agttaagaaa aaggtaaatc attttgtgta 15720 

atatattatg tgcaaagaca tatccaatgc tacatatgga tatcttgtaa aatatcaggg 15780 

ataaagagaa acaaaactta ttaatagcta tgtccaaatt ttccatgata agcaattgtt 15840 

gctattacat tgagaaaata agctaacaaa aactttgtat gcccagttta attcagatct 15900 

caggatatat aattacagga tgcttattta taatagtaga atttaaaaat aaaagtagga 15960 

aaaagcaagg agttaaaaac ttgttgtaga gcgtgatggc agaagaggag gtgctcagtt 16020 

ctggtttcca gaacagaatg accagttagc aacttctcac agataagaat gcctaggtaa 16080 

aaattccata acccaagggt gaggtggggc acactcttgg agcacagaaa ctgggaaaag 16140 

tcatattaga agtgtaaggg gagcagtttc actttgactg tgttgcttct ttcccgacca 16200 

aacagtgtca cactgaaagg gatttcttgg gcctgcgttc tctagtggag aaaagagagc 16260 

ccacggcaga catccaactt ccctgtgttc cagggcactt cccaaggggc ctagttctgt 16320 

ctaatcttgt ggggaataat ggaggaattg gcagggcttg accccttatg gtcagttcat 163 80 

aactacctcc ttctacaacc cattctgtat tccctttacc ttcagcaagc accttagcag 16440 

gacacggttt tttacctggt ggagtgacac aaatcttcat tcctgatggg tctgggccat 16500 

ttgtagtttt gcctggattg ggttgttgta gtttctcact gaccttaatc acaggacatg 16560 

gtaatactgt gacgtcctat gggatctcct gtattccaca aatactcttc cttaacctcc 16620 

attgtggagt agtagtttga tttcatcttg atagtctggg tcagtcaccc cagccaacac 16680 

tgtacctccc ttcttagcct gttgacttag aggcaggaag agctcaaagt ggccaggtgg 16740 

caaacttatc ttccagtttg atggaatcat ctttctgtct tctggtggca aagttcctcc 16800 

ctccggaacg aagacctcta ggcccacaga acataatgtc acagaaacag gaaggaaaat 16860 

tttgctagtg ggtcagtaga ggtgatggta agtggcacca cttccacttc caccccttga 16920 

ttcctgcatt cttggatcct ggctatggaa gaaacagtac cacatattgg atgcctactc 16980 

agagcataca aagacttctg gagaaatttg ccccaggcct gcattttttt tttttttttt 17040 

tgtaaaatga agtcctgctg tgtcttccag aggcctatag tgcagtgtca cgatcttggg 17100 

tcactgcaac ctctgcctcc caggttcatg cgattctctt gtctcagcct cccgaatagc 17160 

tgggattaca ggcatgtgcc accacggctg gctaactttt gtatttttag tatggatgag 17220 

ggttcaccat gttggctagg ctggtctcga actcctgacc tcaagtgatc cacccgcttc 17280 

agcctcccaa agtgctggga ttacaggcat gagccaccgt gcccagcccc tgtaataaag 17340 

tatcacctag ttgttgtaac tgttacttca aaaggctttt ccacctttct atcaatccat 17400 

ctgctttagg ataatgggaa acatggtaag acaagtatat cccatgagca tgagcccact 17460 

gtcttattct ttagctgtaa aatgagtgcc ttggtcagag gcaatggtct gtggaatacc 1752 0 

atgacagtgg ataaggcatt ctgggagtcc acgaatggta atcttggcag aagcattgca 17580 

tgagagatag gtaaacctat atccagagga agtgtctaat gtgactaatc cactctagaa 1764 0 

ttctaatcac ccaaagcttt tggattcctc cctctacatt aaaccaggga agatctggca 17700 

tttcaaggtc gctcacagta ggccatcctt tgatccatat ttcagctaat aaactattag 17760 

aacatttttt aactccctga tttgtaacat tgagtgcaga atctctgctt agtgggccca 17820 

tatcaataca ttcagcctga tccaacttta tattcctccc accattatcc cacaccctta 17880 

atatccattc ctacacctgt tctccagaat tctgcttata taaactagaa aactcaagca 17 94 0 

gctcttttag ggtgtagcac acctccttgt agacttttgg cttacaccaa ctagcaccag 18000 

atcttggaag gagaaaaggg attaaagtct gaatgtattt cctaaagaag aagaccaacc 18060 

aaaataggtt atttaaatat ttggctattc taagtccaaa gctactcttt aaaaggaagc 1812 0 

catatttaac cactccatgt aatatgtaat tatgtcttga catacagaac taggtaatgc 18180 

aataattaaa ggtatttggt caccataatt tttttctaaa aacggagaaa tgtgcaaatt 18240 

atttacctac cagtttacac aaacaggcaa agcaaaaaca aaagcacttg tcaaggtgga 18300 

caaaaatgct gctacagtta aaaatctgca tatactcttt actttgtcat tcttaaaaat 18360 

gtgtttctgg gtagttggtt tactgaaata ggtgaaagct gtactgtaat gtcattttta 1842 0 

aattttgtaa cactttttgt aacattttgt aacatttggt gaagaaatta tgtttcttca 18480 
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cttaaggaga 
gcaaaaattg 
atgtattcca 
gttggagaag 
aacttgtttt 
tacatagaat 
ttatctgata 
aatgtttatt 
actgccaatg 
ctccttattt 
gaacacagca 
ctctagaaac 
atgggtaatt 
agatatatat 
gaggaacact 
atgagataag 
acagcatttc 
tttaagataa 
tagcaattgt 
gtatyttaaa 
aaattaaaaa 
gctacaaagg 
tgttcttcct 
aaaaagagtt 
attgataagg 
cccattgctg 
ctaaaccaca 
ctcccatgtg 
ttaagttcta 
cataatagtt 
gatgctctgg 
atttaatgaa 
aaaagtcaag 
gtggcttctg 
cttcccaaga 
fctrtaatgaa 
tctgacatcc 
tgccctgttg 
agaagacaat 
ttatctataa 
tgtacaaaac 
ccaatggtgt 
caggaactta 
aataatgaat 
agttggcaaa 
gttgttcata 
cacacagtaa 
tcatgcttct 
aaatatttta 
tttatattaa 
tgaaaagaaa 
tcaggaggca 
ccctaaagga 
ccaaataata 
aaacttttac 
ataagaaaac 
agaggcactg 
ctgctggctt 
tctggtatta 
taagagctgt 
ttatagccta 
tttctttttc 
aagacaaaga 



taaattgtgt 
actgaatatg 
tacacatgtt 
cagaattagt 
accacatctc 
tttacacaga 
gcacagcaca 
aaataacagc 
aaaaacaaag 
ggtacagatg 
agagctaggt 
accccaagcc 
atgtttcaga 
atgttgtaat 
cagtgcctag 
aagtagtcat 
ttcaaagttg 
aagatctccc 
atcaatttct 
agtcatagag 
ggactgagat 
aggcagagga 
acaccttcca 
attgtgatgg 
atgctgaagc 
tactgtatct 
ttctcaaatc 
tagactctaa 
acatgatgaa 
attattactg 
ccaggatctg 
aagacagaga 
gctaactgta 
tcacagagct 
cacacatccc 
gcaggaaggg 
tagagagtct 
accaggtcca 
agatgtgaat 
atccaaaatg 
tcaaaattga 
tccagactct 
atatttccca 
cccattccta 
actcatgtgc 
atttgaatga 
ttgcattata 
ttctcaggta 
attgatagaa 
agtgttgcta 
ataggccaaa 
caaatctagg 
gctcatgctt 
ctagtagcct 
tgaattaact 
tgaggcacac 
ggtttttcac 
tcaatagtaa 
aatcatcctt 
atcatgtaat 
tattacttaa 
cactccaaat 
ccaaccttct 



tcattataga 
aattacttta 
tttggtgttt 
tagatttata 
tgttcagctg 
attgtaaaac 
tacgttgcaa 
cataggtgtg 
tttatttaac 
aaaacgctga 
ttcaattaaa 
acatgggttt 
cagttttaca 
gcatatatat 
gtaaaaagaa 
ggtcatgact 
taatttttat 
aatattttca 
ttattcttat 
aaaatatttg 
ttctgaaaca 
ttgaatgact 
tcatcctatc 
ccttgatgtt 
tgttcatggc 
gagtgacttc 
caaattcacc 
ttgagtcagg 
tgtagaagta 
ctactaactc 
agatgtccac 
ataaaaaggt 
gaaaaccatt 
cagtgagagc 
aagacattct 
aatgagaaag 
tctcaaaagc 
taggagccca 
gggagtaggg 
ttccacactt 
tatatcacta 
gaaaccttzct 
aagtctaaca 
catcactctt 
tccgtgagct 
ctttttgcta 
aactctacag 
ggattagata 
ataaatggag 
ttcttttcag 
agaattttta 
tgctgctgag 
tgatgttgag 
caatttgtta 
ttctcatcaa 
agaaattaat 
ccaggaagtg 
actatgctta 
ctgattcatt 
attttatagc 
aaataataaa 
tgaaccatga 
agactctttg 
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actttgaaag 
cacttatgtg 
taaattatct 
gcacactcac 
gatggggctc 
ctgcaaattt 
ctcttactct 
cgaaggattt 
atgtgcagta 
agctcagaag 
atattattct 
gttaagatgt 
attttacaga 
atattcatat 
tggagttgga 
tactttcctg 
tgagaataaa 
aagcatcatt 
tttcattgta 
gccatggaaa 
catacattgt 
gaagaaaaga 
tcaggagggt 
agtccaaaac 
tgtggaaagg 
acctcagagg 
agcaataatg 
atccaaaagg 
ttccaaattt 
actgcaagac 
aattctactc 
aaagttaaag 
tccatcttta 
tgaagccagg 
ccagggaggg 
aaattgaggt 
caaggtcagc 
actccaggtg 
tggaaaagga 
acacattact 
gttcaagttc 
cccgatacca 
tattaaatta 
gggggatgcg 
taggattttc 
gaaataattg 
gttctcaata 
tattttttaa 
ctgaaacata 
agtactggtg 
tttcatcaat 
tctataacta 
tttcgttcta 
agcatgtagt 
gagagaatgg 
aacttgcccc 
tgactgcaaa 
ctgtttatac 
ttgtgttaac 
cctttcataa 
aatatgtttt 
ccactttcca 
taggacagaa 



tgaacgacca 
tgaaaataat 
ccgggctttc 
tatagaaaac 
ctgtgaaaga 
tttttaacat 
gaaaactact 
acaaatggat 
tctttgtgat 
aatgaaataa 
cgtagctact 
ttaactgggt 
tataaaaata 
ttcatatgaa 
tagtaggtta 
gtgtcatgca 
ttaataaaac 
acaaccttct 
gaaaaaaatt 
gggtccaaaa 
tccaagggag 
ttttgaagga 
gaatggattg 
atccttttct 
gaaggggaga 
aaaggaacag 
ctagtaattt 
agggcaattc 
ttagctaaaa 
ttatggaata 
atatgtttta 
gaatcagcca 
ggtctgaagg 
aagaaatccc 
atgcagccaa 
aattccctat 
tggaaaccag 
aacaatgcaa 
gaatcaccag 
atccctatta 
aacaaaagtt 
tccactatac 
tttgggagta 
attccctgaa 
agaagaatcg 
acctgcaggg 
cccattctgt 
aaagtttaaa 
ttctgatgaa 
tttacagagt 
aattattgac 
gatcagataa 
gactagacct 
acatgacaga 
ctattaatca 
aggccacaca 
ctcccagtaa 
ttgatttaga 
aatgccactc 
agaatgggct 
gtgaagctgc 
aggagacacc 
tagaagaagc 



aaccaaatgt 
cttcaacatt 
ctaaattcaa 
atattttgaa 
acaaagcaat 
tttctttacc 
atcacttctg 
agccacattt 
atatatatta 
tgttttccag 
tctgggttga 
aattattttt 
tatatgaaaa 
gaaggcacaa 
atagtggttg 
atttctattg 
tctgtaagat 
ttaaagtttc 
gatttttgag 
tgggaggagg 
taccaagcat 
cactttgagg 
gagagattat 
agtctatgtg 
ggagagttat 
agaagcatct 
tcatatctga 
aatggctcct 
taatatttat 
actactatat 
gctaaagaga 
aaggggttaa 
accaaggaca 
ttgccaacat 
tttagcacac 
ctttcccact 
acaccatagg 
ggcacagaag 
cacacctata 
aaaagataaa 
agtttatctt 
ttgctgtaag 
aagtaaaata 
gacagtgtgg 
aaatattgtg 
atttggtctc 
aatcatgaaa 
atccggcagt 
ttagaaaatg 
gtttgcataa 
tgcctccaat 
aaatccttga 
ggatcattca 
tatattttaa 
aattttatgg 
ggtaggaact 
ctgtcccact 
acaatcaaga 
tttactatta 
acttttaggt 
cttctctaga 
actcttgtta 
aagcattcac 
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atgggaatgt 

ccatttctgt 

attgtgttct 

tggactattt 

aaaacagatg 

ttaaaaatgt 

gctgaagcgg 

aaccccgttt 

tcccagctac 

agtgagctga 

acacacacac 

gtaccatcaa 

gtgatataat 

catatatatt 

aaaagggatg 

agtgaagtaa 

taagctgtca 

acggtggagg 

gatgggtgca 

caccttttcc 

attggaccct 

tctaacttat 

tggacaagtt 

gtgttggcat 

tcatagaaga 

aaatgttgag 

gattatgatt 

caatgttttt 

tagtttgagg 

ttgatacact 

acttgtgtat 

ataaatacaa 

acaaaaatat 

ataagagggc 

ggaagatctg 

gggatgtata 

agggtttcaa 

ttacagtttc 

acatataaca 

gcagaaaaga 

tatattagtt 

acatgtattt 

cataaatgac 

agcaaacaca 

gacagatcgg 

cacagtaatc 

tccaatgtac 

tactcctcag 

ctttatccaa 

gatggcttat 

caggtgtgtc 

aagcacagac 

tttactgtgc 

ccgccatgtt 

cctacacttt 

ccttctccac 

gcatgtacct 

tttatatctt 

ttaaaagaag 

ttggtcatag 

aacaaatcaa 

agtgtatgtc 

gcatagaaat 



ctcagtgtag 
ccaattgcag 
tgttttttca 
attgcttgca 
aattgagaag 
aattaggcgg 
gtggattgcc 
ctactaaaaa 
tcgggaggtt 
gattgcacca 
acaaatgaaa 
acagagggat 
atataataca 
atatgtaata 
aagtaatggc 
ttcagaaata 
ggatgcaaag 
aggtgaggga 
ccaaaatctc 
cccaaaacct 
gtgagcttta 
taccctaggc 
ttgaacattg 
ggtagttatt 
aacacgtcga 
agattcaggt 
tcactttgat 
tcttagtttt 
ataagtttct 
attgttttct 
ctatatacca 
atatgtaata 
acaaaatgaa 
cagctgaggt 
gaaaatgatc 
ccaggcatca 
aggctgatac 
aaagaaaagc 
aagacttgca 
accacagaaa 
tagatgttta 
acattttgag 
aaacctaaat 
atgttaaaga 
gctgagctgc 
ggcaatgtga 
ttcttcctca 
atgctggtta 
tttcactttt 
gaccgctaca 
tgcctctgtc 
caccctgatg 
ggacccaccc 
ggtggtggct 
catcttcact 
ctgcgggtct 
gaggccccct 
tgtgagtccg 
tataaggaaa 
gcgttggaat 
tctgtcattg 
aaattattag 
tcaaatataa 



ggtgcatgtc 

aagcttctcc 

gaagaaagac 

ttacttcaac 

ttattttgaa 

ggcacggtgg 

tgaggacagg 

tacaaaaaaa 

gaggcagcag 

ttgcactcca 

tccagcacca 

agagaatgtg 

ccacagtttc 

tatatacaaa 

attcacagca 

aaataccaaa 

gcataagaat 

taaaaggcta 

acaaatcacc 

atggaaataa 

acaaggtaat 

agaagcccaa 

aatttacaga 

aaaaaaaagt 

ttgttccatc 

caggatattt 

ctgttacatg 

tcaagtgtta 

tcatcatttt 

ttatatccct 

tgcaaatgta 

tctactttca 

gaaatgtagc 

agaataaagt 

tccagttcac 

gtcccacttt 

tttagatccc 

aatgtttaca 

aaaagataaa 

accattttaa 

aagcatcaat 

agaagaggaa 

gcattttaat 

aaaaccatac 

agtcccttct 

gcatgatctt 

gtcacctctc 

actttttatc 

tcattgcact 

tggccatctg 

tggctgctgc 

cttcgtctgt 

ctcttagtcc 

ggttccaacc 

gccattctgc 

catgtgaccg 

tctgagacat 

atgttaaacc 

gttattcaaa 

ctgttcttat 

agtgtttttt 

ctagagctta 

gactagaaga 
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aggagagcac 

ctgagatttc 

tatttccaca 

ttccaataga 

gggtgattaa 

ctcccgcctg 

agttcaagac 

ttagctgaac 

tattgcttga 

gcctgggcga 

aaacagtatt 

gtatattata 

tttatccact 

atgtatatga 

acttggatgg 

tatagtgtgt 

gacacaaagg 

cacattgggt 

actaaagaac 

aaaataaaaa 

tatgattcta 

tgtccttttg 

gtgctttatt 

tagtataaca 

atgatcttca 

taagtatggt 

ttttaacaaa 

gaatgaggta 

atccttattg 

atcaacattt 

tatggtttat 

ttccccaaga 

ttggtcttta 

gaagaaaatg 

aggggcaggc 

cctaagcacg 

acattcaaag 

accatgggtt 

agataaggct 

atattattgc 

cacatgctca 

gaaatagcag 

ttccttttat 

agccgtgact 

ttttgtggta 

gttaatcaga 

ctttgtagat 

caagagaaaa 

ggtgattaca 

caagcccttg 

tccctatatt 

ccttctgtgg 

tcgcctgctc 

tcatttgctc 

gtatccacac 

ctgtcactgt 

ctatacaaca 

cattgatcta 

agaaactgtt 

tatctgacca 

gtcctttgta 

cactgccacc 

cattgttcta 



aggctcagat 
cttgctcaaa 
aaattctcag 
aaaaaaaaaa 
tatttcaata 
ttatcccagt 
cagcttggcc 
gtggtggtgg 
acctgggaga 
caagagtgaa 
tgcctatttg 
tatatatcat 
tgttgtttga 
tggaatacta 
gattggagac 
tctcactcat 
acttcgggga 
tcattgtata 
ttactcatgt 
taaaaaccaa 
ttgatttagg 
tgtagaatga 
tatgaagtga 
ttgtcaagtc 
tggtcacttt 
ctcagtcatg 
ttcatattga 
gagttaaaac 
cccccctccc 
aataccatca 
gtattatgtg 
accaattttt 
aagttaaaaa 
ggcacacatt 
aaagcgattc 
ttcagttgtg 
gtgtgttgtt 
caagaaaagt 
ctttaactat 
ctttgtatat 
ctaggctatt 
atgacaccac 
ttagatgtca 
gagtttgttc 
tttctagtca 
agtgactcga 
ctctgttata 
accatttcct 
gattattata 
ttatatggaa 
tatggctttg 
acccaatgac 
agatacttat 
tctcaccgtc 
tgctgagggg 
cttctatggg 
ggggaaaatt 
cagcctgagg 
tgctaagtaa 
attaatgaac 
atttgcatat 
tcagtaaatt 
gctctgtaaa 



gcctccaaag 
gcagtaggat 
ttaaattatt 
gactcataaa 
aaaaagccca 
actttgggag 
aacatagtga 
gcacctgtaa 
cagaggttgc 
actccgccac 
gcaattaaat 
acattataat 
tggatatata 
ctcagccata 
taatattcta 
aagtggaaac 
cttgggggaa 
ctgtttggat 
aaccaaatac 
aagaaattca 
tgacttatct 
gataatacgg 
cctgtttcca 
tggatatgtg 
tatcttgccc 
ttgttaatat 
acttgtgttc 
aacagcattt 
caggagccct 
atacaggtgc 
catatgtatt 
gcccccttgg 
agatttgttg 
agagggtgag 
ttgttctaca 
ataaacctgg 
aaacaaagaa 
ctaagtgaac 
caaaagactt 
taaaaaactc 
tcttaatgtc 
tggggtaatg 
tttgaagcca 
tcctgggact 
tctaccttat 
cactacacac 
ccaccaatgt 
tcatcggctg 
tgctcacagt 
gcaaaatgac 
caaatggtct 
atcaaccact 
gtcaaagaga 
atcctcattt 
aggcgcaagg 
acactgttct 
gtagctgttt 
aataaagacg 
ggtagatatt 
atttaaaatt 
gggacttaaa 
gaaaatgaaa 
aagtaatgaa 
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gtacagatga ttccattatt agtactcata gtactaatat atcatagtac tatatcatag 26100 

tactaatata tcatagtact atatcatagt actaatatat catagtacta tatcatagta 26160 

ctaatatatc atagtactat atcatagtac taatatatca tagtactata tcatagtact 26220 

aatatatcat agtactatat catagtacta atatatcata gtactatatc atagtactaa 26280 

tatatcatag tactatatca tagtactgta tcatagtact aatatatcat agtactgtat 26340 

catagtacta atatatcata gtactgtatc atagtactaa tatatcatag tactgtatca 26400 

tagtactata tatcatagta ctatatatca tagtactata tatcatagta ctatatatca 26460 

tagtactata tatcatacta aaatatcata aaagacagaa ttatcaattg ctgagaaaat 26520 

gacagctcat tacatttttt taaatgtttt tgtttgtccc tttgaatgaa tggaggatat 26580 

ctacctaact tctatgtatt cagaatctca catgaggcag cacatttagc acacttgttt 26640 

agttgtaatc ttctatgctt atcttcgtga gtctgtttga aaagttggat aaatggcaat 26700 

gagtgactac atttgctaat tacattttat aatacttaat aagcctccaa gctattttcc 2 6760 

agaaacttgc caaccatagt agatgaggtt tctgtttttc ttttatacat attatataga 26820 

aatgaagatc ctttaggtgt actttggatt attactttaa gttatatcct tagaatgata 26880 

tcatagatcc aatgaaaccc tatttcataa ggcttttcat ggaaatgcca aatttccttt 26940 

agaaagagtt tttaattgtt tgtatgttgt accagtaagg attcccatca acatgtgaac 27000 

tcaattcttc actcttttat taccattgcc tgtttccttt gaataaataa attgaattga 2 7060 

atgttttatc cacgaatgca tgtggatatt tgtgtgtttt atgtgtgtat gtaggtgtgt 27120 

gtgtgtattt tgattacaat ataacatttt ctttttttta attgcttatt caaatttttt 27180 

tggtgagaat caaggtcgta tgcctttttt tttttttttt ttttttttga cagagtattg 27240 

ctctgttgct caggctggag tacagtggcg cgatctcggc tcactgaaag ctccgcctcc 2 7300 

cgggttcacg ccattctcct gcctcagcct ccegagtagc tgggactaca ggcgcccgcc 27360 

accacgcccg gctaattttt tgtattttta gtagagacgg ggtttcaccg tgttagccag 2742 0 

gatggtcttg atctcctgac ctcgtgatcc gcccgcctcg gcctcccaaa gtactgggat 27480 

tacaggcatg agccaccgcg cccagctgct tcacctgttt ttaaattgtg gtgtttatat 2 754 0 

gttttcttat ttatttcaga tatttctcaa tgatataaac atgtttttgt tatatataca 27600 

ggtgtagcaa ttgtatatat atacacatga attccatttt atctcttcta ttgtgtcttt 27660 

tccaaagtta atattttatc attttaaatt ttctcaatat ttttttattt cactttattg 27720 

tgtctttttt tacttttagg aagttaagtt tatgccacct atgatcatgt ggatttttac 27780 

atatattatt aaattatttt catggtttcg cttgttaatt ttttcttatg aactattttt 27840 

aatatacatc taaaatagaa aataatataa agactccttt gtaagcacaa ctaatcacta 27900 

ttaaatcttc acagaatgcc ccatttgttt cagttatttc agaaatttaa aacttacaaa 27960 

gatacatctc tcctgactga ttctgtctct tgtcccattt cttgctgtaa actctattac 2802 0 

aaatttaatg tttcttatcc ttctgtagag ttacatcatt ggataggtgt ttatatattc 28080 

atagaaatat gtagtactga tttgctacat ttaaaacttg aaataaatgc atccactcta 28140 

tgtatctttc tgcaatttat ttatcttgca ctaaatattg ttggatgtat ctgtggtgaa 28200 

tcatgcagca gcaattttta gtggtatagt ctatgccaat atataaatat gcctcaatat 28260 

gtgtatttat tctacgattt tttttttttt ttgagacaga gtctcactct tgttgcccag 2832 0 

gctggggtgc aatggcacga tctcggctca ccgcaaactc cgccccccag ggtcaagcga 28380 

ttctcctgtc tagcctccta gtagctggga ttacaggtgc ccaccaccat gccaggctaa 2 844 0 

ttttgtattt ttaatagaga cgggatttca ccatgttagc caggatggtc ttgatcacct 28500 

gaccttgtga tcctcccgcc tcagcctccc aaattgtatt ttatttctac tgttggtgaa 28560 

gagttacatt ttccagtttc ttgataatat gaaaatccat taataaatgt tttgtgtgta 2 862 0 

attatgaatg tgtaccacgt aaggagttcc tctcaggtag gttggtaatt tcttggtcat 28680 

aagatataag tatctttaaa tttactatat attgccaatt gttttactga tttatatttc 28740 

tatcaacagt ttgtgagtgt tacctttaca tcactttatt cttgccaaaa ctaggtgttg 28800 

tcagacttta aatgttttct aatctgatgg gtaaacaatg aattctcatt tggtgtttaa 28860 

ggttgaattg ttattacaag tgaaactaag tatttttaag tatattgatg ctaatctcat 28920 

atttctcttt cacttatatg tccatatact ttgttaactt ttcaggtagg gtgatttgtc 28980 

tttatttcat acatttataa tagttcctta tgtaccaaat gctacataca aatcttttgg 29040 

tgttatatgc catttaacct taaaaaccta gggtaataat gccaggcagt gtggctcaca 2 9100 

catgtaatcc cagtactttg ggaggctgag gcaggaggat tgcttgagcc caggagtttt 29160 

aggccaaact acacaagata gagactctgt ctttacaaaa aaataaaaaa aaattagcca 2922 0 

gttgtggtgg gatgcccttg tggtcccagc tacatgggaa gctgaggcag gaggatcact 29280 

tgagcctggg aggtagaggc ttcagtgaac catgttgtca ccacgacact ccagtctgga 2934 0 

tgacagagga tggatgacag agcctggatg actccagcct ggatgaccct gtctcaataa 294 00 

aacaaaacaa accccaaaaa acctagggtg agatgaacag atgaacgtta tttcttgaat 29460 

gcctcagcaa atagccatag gtatctattt ttctggtagc tgcaatttga ttgggacaag 29520 

tttggaattt cagctcattt gacaagtatt ttcataatag atattttccc agtaggattg 29580 

gatttttttt ttccacatag tgaatgctta tatgtgagca taaaatgttt cccacagatt 29640 

gggattggat aaattatatg aagtgtgatc ctagtatatt gctttgccat tgaaatactg 2970 0 

gttattaccc aaatagaaaa ttaacattta aaagagccaa caggctgggc aaagtggctt 29760 

aggcctgtaa tcccagcctt ttgggaggcc aaggcccaca gatcacttga gttcaggagt 2 982 0 
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tcgagactaa 
taggtctggt 
cttcagcctg 
ggtaacagag 
tgcctgtaat 
agaccagcct 
gtgcagtggt 
gaacccgggg 
gacagagcca 
tataataaga 
gaaattcaac 
attaaaaagg 
gagagaactt 
aaacagttgc 
agtcacacag 
tgtgagaaat 
ccagtctaca 
gaggtaccgg 
cgcaccctgc 
cagggagttc 
cactcccacc 
atatcccgca 
gcagtctgag 
ccaggcatgc 
gctcaaggag 
aaaagacagc 
gttctcccag 
ctgacccctg 
acctcacacg 
aaactaacaa 
agaccaaaag 
tctaaaaagc 
caaagctgga 
tactccaagc 
aatttagaag 
gagctgaaaa 
gatcaactgg 
gggaagttta 
tatgtgaaaa 
accaagttgg 
caggccaaca 
gcaactccaa 
agggcagcca 
gatctctcag 
aaagaaaaga 
ggagaaataa 
accctaaaag 
ctgcaaaatc 
cgagcaaaat 
actttaaaca 
aagagtcaag 
cataggctca 
gaaggggttg 
gacaaagaag 
ctaaatacat 
ctacaaagag 
acattagaca 
ctgcaccaag 
acattttttt 
gctcttctca 
gcaatcaaac 
ctgaacaacc 
atgttctttg 



cctgggcaac 
ggcaagtgtc 
ggaggcaaag 
taagatcctg 
cccagcactt 
ggccaagatg 
gggtgcctgt 
ggcagaggtt 
gactccaact 
atgccaacag 
tccaatcttt 
catttttttt 
catatttcaa 
acatacccat 
aaaattcata 
gaaacctatt 
gctcccagcg 
gttcatctca 
gcgagccgaa 
cctttcctag 
ccaatactgt 
.cctggctcag 
atcaaactgc 
ttaggtaaac 
gcctgcctgc 
agtaacctct 
cacgcagctg 
acccccgagc 
gccgggtact 
acagaaagga 
cagataaaac 
agagcacctc 
tggagaacga 
tacgggaggt 
aatgtataac 
ccaaggctcg 
aagaaagggt 
gagaaaaaag 
gaccaaatct 
aaaacactct 
tccagattca 
gacacataat 
gagagaaagg 
cagaaactct 
attttcaacc 
aatactttac 
agctcctgaa 
atgccaaaat 
aaccagttaa 
caaatggact 
acccatcagt 
aaataaaagg 
caatcctagt 
gccattacat 
atgcacccaa 
acttagactc 
gatcaacgag 
catacctaat 
cagcaccaca 
gcaaatgtaa 
tagaactcag 
tactcctgaa 
aaaccaacga 



atgaggaaac 
tgtagtccaa 
gttgcagtga 
tcttaaaaaa 
tgggaggccg 
gcgaaacccc 
aattccagct 
gccatgagcc 
caaaaataaa 
aataattctg 
taattttatt 
cttaaccaat 
ggcacagccc 
tcctgtgatt 
ataatacaat 
gtatcggtta 
tgagcgacgc 
ctagggagtg 
gcagggtgag 
tcaaagaaag 
gcttttccaa 
agagtcctat 
aaggcggcag 
aaagcagcca 
ctctgtaggc 
gcggacttaa 
gagatctgag 
agcataactg 
ccaacagacc 
catccacacc 
cacaaagata 
tcctcctcca 
ctttgacgag 
cattcaaacc 
tagaataaca 
agaactacgt 
atcagcaatg 
aataaaaaga 
atgtctgatt 
gcaggatatt 
ggaaatacag 
tgtcagatac 
tcgggttacc 
acaagccaga 
cagaatttca 
agacaaccaa 
ggaagcacta 
gtaaagacca 
catcataatt 
aaatgctcca 
gtactgtatt 
atggaggaag 
ctctgataaa 
aatggtaaag 
tacaggagca 
ccacacatta 
acagaaagtc 
agacatctac 
ccacacctgc 
aagaacagaa 
gattaagaat 
tgactactgg 
gaacaaagac 
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cctgtctcta 
gctactcagg 
gctgagatca 
ataataaaat 
aggcaggtga 
gtctctacta 
acttgggagg 
aagatcgtgc 
taaataaata 
atttcagcta 
ctcctttatt 
agttttatcc 
aaatatcaac 
gtgtagatat 
gccatgtttc 
agtactgaag 
agaagatggg 
gcagacagtg 
gcattgcctc 
gggtgacaga 
cgggcttaaa 
gcccacggag 
cgaggctggg 
ggaagctcga 
tccacctctg 
atgtccctga 
aaagggcaga 
ggaggcaccc 
tgcagctaag 
gaaaacccat 
gggaaaaaac 
aaggaaagca 
atgagagaag 
aaaggcaaag 
aatagagaga 
gaagaatgaa 
gaagatgaat 
aatgagcaaa 
ggtgtacctg 
atccaagaga 
agcacaccac 
accaaagatg 
ttcaaaggga 
agagagtggg 
tatccagcca 
atgctgagag 
aacatggaaa 
tcaagactag 
acaggatgaa 
attaaaagac 
caggaaaccc 
atctaccaag 
acagacttta 
ggatcaattc 
cccagattca 
ataatgggag 
aacaaggata 
agaactctcc 
tccaaaattg 
attataacaa 
ctcactcaaa 
gtacataacg 
acaacatacc 



aaaaaaccac 
aggctgatgt 
tgccagtgca 
aggccaggtg 
atcacaaggt 
aaaatacaaa 
ctgaggcagg 
cactgtactc 
aataaaataa 
cagtttatgc 
ctgttgttca 
ttattgctgt 
tcccttgtcc 
tgcaactcta 
ctaatatatt 
atggccgaat 
tgatttctgc 
ggccgaggtc 
actctggaag 
tggcacctgg 
aaatggcaca 
tctcactgat 
ggaggggtgc 
actgggtgga 
ggggcaggac 
cagctttgaa 
cttcctcctc 
cccagcatgg 
ggtcctgtct 
ctgtacatca 
agagcagaaa 
gttcctcacc 
aaggcttcag 
aagttgaaaa 
agtgcttaaa 
gaagcctcag 
tgaatgaaat 
gcctccaaga 
aaagtgacag 
acttccccaa 
aaagatactc 
aaatgaagga 
agcccatcag 
ggccaatatt 
aactaagctt 
attttgtaac 
ggaacaaccg 
gaagaaactg 
attcacacat 
atagactggc 
atctcacatg 
caaatggaaa 
aaacaacaaa 
aacaagaaga 
taaagcaagt 
actttaacac 
cccaggaatt 
accccaaatc 
accacatact 
actatctctc 
accactcaac 
aaatgaaggc 
agaatctctg 



aaaaattagt 
gggaggatgg 
ctccagcctg 
cagtagttca 
caggagttcg 
aattagccgg 
agaatcgctt 
tagccttggt 
taaaatataa 
ttttttatga 
gacattgcat 
ccttattctg 
tttcatgagc 
tttgtctgtt 
ctgagagttc 
aggaacagct 
atttccatct 
agtgggtgcg 
cacaaggggt 
aaaatcgggt 
ccaggagatt 
tgctagcaca 
ccaccattgc 
gcccaccaca 
acagacaaac 
gagagcagtg 
aagtgggtcc 
gcagactgac 
gttagaagga 
ccatcatcaa 
aactggaaac 
agcaacggaa 
atgatcaaat 
ctttgaaaaa 
ggagctgatg 
gagccgatgc 
gaagcaagaa 
aatatgggac 
ggagaatgga 
tctagcaagg 
ctcgagaaga 
aaaaatgtta 
actaacagcg 
caacattatt 
cataagtgaa 
caccaggcct 
gtaccagccg 
catcaactaa 
aacaatatta 
aaattggata 
cagagacaca 
acaaaaaaag 
gatcaaaaga 
gctaactatc 
cctgagtgac 
cccactgtca 
gaactcagct 
aacagaatat 
tggaagtaaa 
agaccacagt 
tacatggaaa 
agaaataaag 
ggacacattc 
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33480 
33540 
33600 
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aaagcagtgt gtagagggaa atttatagca ctgaatgccc acaagagaaa gcaggaaaga 33660 

tccaaaattg acaccctaac atcacaatta aaagaactag aaaagcaaga gcaaacacat 3372 0 

tcaaaagcta gcagaaggca agaaataact gaaatcagag cagaactgaa ggaaatggag 33780 

acacaaaaaa cccttcaaaa aattaatgaa tccaggagct ggttttttga aaggatcaac 33840 

aaaattgata aaccgctagc aagactaata aagaaaaaaa gagagaagaa tcaaatagag 33900 

gcaataaaaa atgataaagg ggatatcacc accgatccca cagaaataca aactaccatc 33960 

agaggatact acaaacacct ctatgcaaat agactagaaa atctagaaga aatggataaa 34020 

ttcctcaaca catacactct cccaaactaa accaggaaga agttgaatct ctgaataggc 34080 

caataacagg atctgaaatt gtggcaataa tcaatagctt accaacgaaa aagagttcag 34X40 

gaccagatgg attcacagcc gaattccacc agaggtacaa ggaggaactg gtaccattcc 34200 

ttctgaaact attccaatca atagaaaaag agggaatcct ccctatctca ttttatgagg 34260 

ccagcatcat cccgatacca aagcctggca gagacacaac caaaaaagag aattttagac 34320 

caatatcctt gatgaacatt gatgcaaaaa tcctcagtaa aatactggca aaccgaatcc 34380 

agcagcacat caaaaagctt atccaccatg atcaagtgga cttcatccct gggatgcaag 34440 

gctggttcaa tatatgcaaa tcaataaatg taatccagca tataaacaga accaaagaca 34500 

aaaaccacat gattatctca atagatgcag aaaaggcctt tgacaaaatt caacaacact 34560 

tcatgctaaa aactctcaat aaattaggta ttgatgggac gtatttcaaa ataataagag 34620 

ctatctatga caaacccaca gccaatatca tactgaatgg gcaaaaactg gaagcattcc 34680 

ctttgaaaac tggcacaagg cagggatgcc ctctctcacc actcctattc aatatagtgt 34740 

tggaagttct ggccagggca attaggcagg agaaggaaat aaagggtatt caattaggaa 34800 

aagaggaagt caaattgtcc ctgtttgcag atgacatgat tgtatatcta gaaaacccca 34860 

tcatctcagc cccaaatctc cttaagctga taagcaactt cagcaaagtc tcagaataca 34920 

aaatcaatgt gcaaaaatca caagcattct tatacaccaa caacagacaa acagagagcc 34980 

aaatcatgag tgaaatccca ttcacagttg cttcaaagag aataaaatac ctaggaatcc 35040 

aacttacaag ggatgtgaag gacctcttca aggagaacta caaaccactg ctcaaggaaa 35100 

taaaagagga tacaaacaaa tggaagaaca ttccatgctc atgggtagga agaattaata 35160 

tcttgaaaat gtccatactg cccaaggtaa tttacagatt caatgccatc cccatcaagc 35220 

taccaagggc tttcttcaca gaattggaaa aaactacttt aaagttcata tggaaccgaa 35280 

aaagagcccg catcgccaag tcaatcctaa gccaaaagaa caaagctgga ggcatcacac 35340 

tacctgactt caaactatac tacaaggcta cagtaaccaa aacagcatgg tactggtagc 35400 

aaaacagaga tatagatgaa tggaacagaa cagagccctt agaaataacg ccgcatatct 354 60 

acaactatct gatctttgac aaacctgagg aaaacaagca atggggaaag gattccctat 35520 

ttaataaatg gtgcagggaa aactggctag ccatatgtag aaagctgaaa ctggatccct 35580 

tccttacacc ttatacaaaa atcaattcaa gatggattaa agacttaaat gttagaccta 35640 

aaaccataaa aaccctagaa gaaaacctag gcattaccat tcaggacata ggcatgggca 35700 

aggactttat gtctaaaaca ccaaaagcaa tggcaacaaa agccaaaatt tacaaatggg 35760 

atctaattaa actaaagagc ttctgcacag caaaagaaac taccatcaga gtgaacaggc 35820 

aacctacaaa atgggagaaa attttcacaa cctactcatc tgacaaaggg ctaatatcca 35880 

gaatctacaa tgaactcaaa caaatttaca agaaaaacaa acaaccctgt caaaaagtgg 3594 0 

gtgaaggaca tgaacagaca cttctcaaaa gaagacattt atgcagccaa aagacacatg 36000 

aaaaaatgct catcatcact ggccatcaga gaaatgcaaa tcaaaagcac aatgagatac 36060 

catctcacac cagctagaat ggcaatcatt aaaaagtcag gaaactacag gtgctggaga 36120 

ggatgtggag aaataggaac acttttacac tgttggtggg actgtaaact agttcaacca 3618 0 

ttgtggaagt cagtgtggcg attcctcagg gatctagaac tagaaaaacc atttgaccca 36240 

gccatcccat tactgggtat atacccaaag gactataaat catgctgcta taaagacaca 36300 

tgcacacgta tgtttattgc ggcattattc acaatagcaa agacttggaa ccaacggaaa 36360 

tgtccaacaa tgatagactg gattaagaaa atgtggcaca tatacaccat ggaatactat 3642 0 

gcagccataa aaaatgatga gttcatgtcc tttgtaggga tatggatgaa attggaaatc 36480 

atcattctca gtaaactatc gcaagaacaa aaaaccaaac actgcatatc ctcactcata 36540 

gatgggaatt gaacaatgag aacacatgga cacaggaagg ggaacatcac actctgggga 36600 

cggttgtggg gtggggggag gggggaggga ttgcattggg agatatacct aatgctagat 36660 

gacgagttag tgggtgcagg gcaccagcaa ggcacatgta tacatatgta actaacctgc 3672 0 

acattgtaca catgtaccct aaaacttaaa gtataataat agtaataata ataaattaaa 36780 

aaacagaaac attatatcta tctccttgtt ttcatagcaa gtcatcaata catgtttcta 36840 

aattgagaaa gaataattac aattgaaggc acatagtgat gaaagaaaaa ttgttgattc 36900 

tgacatttgg agctgaaaat atttaatagc tacaatcttt aaaagtcagt atttgaaatg 36960 

aaaagttctt ttatatatgg ttgggtgatc tgttggataa tgttttattc ttcttaaagc 37020 

aaggtgtcct tgttctttgg acatattttc agtgttgtga gttgtctttc cttgtctctg 37080 

tctctttctc tctgttgccc tctctgtatc actgcctttc tctcaattcc tatttcataa 37140 

tttggtttag cttatattgt gtcttacaga ctcttcacat gttggcctca gccttacata 37200 

ctttcaggac aatctcagta cagacaaggc tgctgcttgt ttttggaaat gactcctctt 37260 

gcctgaaatg gctcaagtgc acagcaggtg tatgtgtgtt tgttggtgtg tatgtattgt 37320 

gtatgttcat gtatggtgct ggtggaggtg caatcataca acgagaataa tttttgctct 373 80 
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acaatagcta 
ctgttcatta 
gatagtttat 
agatacatta 
gcactgccta 
ttttcgcact 
gactgatgca 
ttgcttgctg 
gtagtatatg 
gaggatcaga 
aaccttaaaa 
catgtgtttt 
catgcttatt 
ttgaaaagtc 
aaggccaata 
atctaagttt 
cttccatgat 
ctctcaactt 
ttagctttcg 
taactttgtg 
atctagcaag 
atttatgggg 
gtttgacaaa 
tgggaagtgt 
tcaggaattt 
agtatccaga 
gacagttttg 
agatagatag 
agagtagtaa 
atgtttgaga 
atttctgtgt 
ccagtttttc 
gtttcagttg 
ccattgaagt 
taaggacaga 
tgaactctaa 
aggaagaggc 
gaaaggagtg 
ttgggagcta 
tgttggcagc 
gatatggcct 
ggttgaggcc 
aatctttgtt 
tgttgaattt 
gctcacaact 
ttgtgggaga 
tagtcttcct 
gttttaaaat 
attcacaatt 
agctattctt 
gaatttttat 
aaatatttta 
acctttgttc 
tctgtacact 
cttagtaact 
ttttcaagaa 
gttaacctga 
ctaattaatt 
atggacattt 
tgtgttttta 
tgctgaaatt 
ttgactatat 
ctttaatttc 



ctaccattta 
acaaacttct 
cctgttcaga 
gatgatctca 
ttataatgca 
ggtttgctat 
ttgtcatttt 
actgtctatt 
atactcgcac 
acaaaatcag 
atgtgactgt 
ctcttttctg 
cagtgtacct 
actggtcaca 
accatttgcc 
ctcttcttct 
actggggaca 
gaattatttg 
atgcttaagc 
gctctgtttt 
taatttgatt 
taaattcact 
tgtgtacagc 
gagaataggg 
tttcttgaat 
aacagttgac 
ggaaggtcat 
atgatagatt 
cccattatag 
ctctgtatgg 
catagtttta 
acagcccatt 
cttcacatta 
gagtgtgaag 
ttttattcag 
ttttgttttc 
atgagcaggg 
ttgatccatg 
gtagacaaga 
cttgagtttt 
tgagctgtta 
tagtcaagaa 
atgcggtgtt 
cttttcatgt 
tagactttct 
tatttatatg 
agtctatggt 
ttgatgacac 
agtgctttta 
ccattttttt 
ttccaccaat 
atgaacatat 
aaaaatccat 
taacctcttc 
cttaggtgga 
agccttcaga 
ttttacttca 
agttatgtaa 
aagtagtgca 
tgacagcatt 
tgatagaagt 
tgaatattct 
gctctgcaga 



ttttactttt 
agttaggtaa 
tgtgagaaag 
catgcacaat 
tagaattctg 
gctaatattg 
ccccataact 
tgttctaaca 
ataattgttt 
atcataagac 
tttccttctc 
tttctgttat 
gcttttcaag 
tcagcccaca 
ttgaccacac 
tctctccttc 
aaaaaagaga 
ttttaaaaaa 
ttttgacttg 
ctatttggtt 
tgcttaattc 
tacattaaat 
catgttatca 
caagtaaaga 
aaatactcac 
ttttgacttt 
tactatgtca 
catcaagatt 
ggatatacta 
atatgtcttc 
tgatgaacta 
gaattatttt 
tcattgttaa 
tgttacaaag 
taatatacta 
acagagataa 
ctcaagagag 
tgaaacccac 
gctctttctc 
ctcaggcagg 
gccactgtgt 
agtgctcaga 
tcataatggg 
gatcgtttgc 
tttttttttt 
ttcttaatac 
ttgccttttc 
acacacactt 
ttttcttcct 
ttttttcctg 
ttagttttga 
ccatattttc 
tggctacata 
atcatcctta 
tgatattaac 
aacccctgtt 
attacaatga 
atactacaat 
aattatgcaa 
tattttagaa 
ggctttaaaa 
aatgcacaag 
cgtttgtgct 
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tgcagtaaaa 
gaaggtgtta 
aatgttaaaa 
tcctatgtcc 
ttggtggaaa 
ttatatgttg 
tgccatatta 
ttatttttta 
ggactaaatt 
atgacagaac 
ttttttgagg 
actgcctgac 
gaaaatgtaa 
tggatactct 
ccaattccta 
ttcttatttt 
agataagaga 
gtatagggga 
ttgtaatcct 
ttgtttttaa 
tgaggaaaga 
gtaggaactt 
caattccaat 
tgccacaaag 
ggaattgttg 
tttttagtgt 
atcagaaagt 
gtatgaacag 
taattggttc 
ttttgccttg 
taaatttata 
cattattata 
tctttgtatt 
ataaacaaag 
ttgcaatgga 
ctgtgtattc 
tcagagaagt 
ccagttttgc 
cacatattgg 
ttcttccagg 
ttgtgttttg 
ggattctttc 
cataatttac 
tgttcatgta 
aattgaactg 
caatcatttg 
attatcttag 
tatatatata 
aagaaatctt 
gaggacttgt 
gtattggcat 
cagtagcagt 
tacgagtgga 
gatccacaca 
ttaagggaca 
caacttgtgt 
tcaattgatt 
ggtcagataa 
caatatttgt 
tcagatcatg 
ttattgatag 
tgagacgtat 
tttcagtgtg 



tttttttctt 
ttaactaaga 
ctagtgtcta 
attttgcaaa 
aatacaagtg 
aaagtgcatg 
gatcactaca 
aaacaatcag 
agcagtttga 
acatatagct 
cttcttagag 
agaaacatca 
tttgtgtgtt 
ggattcttat 
cttaaaactc 
ttttttttct 
tgaactagtg 
aaatgggttt 
gggtatggaa 
tgattctctg 
aaaaatttcc 
taagtgtgca 
tttcaaggct 
ttcactgtcc 
caagcttttg 
tctcatggtt 
tgattcttag 
tttgttcatt 
atctatgcat 
gataaatact 
taaaatttta 
tcttgttatg 
gtcaatattt 
ccagctacca 
caggtaggtt 
taaaggtaga 
aaaaaaatta 
caactggctc 
ctgaaacaga 
agtactacag 
ttcaagtctt 
tagagtttgg 
attttcttga 
tcttctttta 
tttatatttg 
tcagaattac 
cattattttg 
taattatata 
tgctttcctc 
agatttagca 
gtggtgtgtt 
ctccattaaa 
tctatttgtg 
ctgctttgaa 
gtttttcaca 
ttgaagacat 
gatgtctgct 
ttggacacat 
tcactttcca 
aagttttgca 
atttaggaag 
gtctttatta 
tatcttttgc 



acccactgct 
gtttgttaat 
ctctattagg 
catatttact 
tgtccagaca 
atttgagatt 
ttaatttatt 
aaacttctta 
ccatccttat 
ttgagactaa 
ttgaaagctt 
ctttacttac 
ggaaaggaat 
ctccatagag 
tttatctaac 
tttaaccagg 
gctttctgga 
gaggaacacc 
agaactctga 
gatgttaaac 
tttgagttaa 
gttcagtaaa 
actgtgcagc 
ttaccaagat 
attactttct 
ttcatggagg 
atagatagat 
tttattgctg 
ctgctgatac 
gaagagcaaa 
taaagggcca 
tgttataaga 
tttattcctc 
gttaaagtag 
cactgtaaac 
atgaggaaat 
caaaaagtgt 
ctaccctccc 
cagtatcttt 
tcatcccagg 
tttaggccaa 
tccaggagga 
caaatgcaga 
tgaagtgtct 
gaccagtgaa 
ttactatgaa 
aagaatgcgt 
tatatatttt 
aagatcataa 
tttacattta 
ataggaagtt 
atacctttgc 
atctctttat 
tgctatactt 
tactttattt 
taaatgttta 
agaagtgcac 
ggtactgacc 
aaggggtttt 
aacatcatcc 
aattaacaca 
tttaggtctt 
acacttttta 
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tttttgtttc 
gtattgcttt 
ttttttagac 
ctatgcaatg 
ggggctacag 
ctccctacct 
agccatcgca 
atcatcttcc 
taaagcatca 
tatttagtag 
tagccaattg 
aactgtctga 
atacatcaat 
tcagtaatat 
gcatgtgtaa 
accaaatcta 
ccatattaaa 
ttgtctatcc 
taactcttat 
tgaatataca 
cattctgttt 
tgagagacac 
taccatttta 
tattatactt 
catgtgccat 
taatgctatc 
ttcctatgtc 
ttggttttct 
catccatgtc 
ggtgtatatg 
tcaagtcttt 
gcagcatgat 
aataccattt 
taaggcttta 
gggatgtaac 
tcaaagaata 
agagcccatt 
ctaatgaaca 
aatattcttc 
tttataaacg 
tagggcagac 
gaccaatact 
tacaaatcct 
gtaaaggacc 
tggttttatt 
ttgaaagtga 
actgtacata 
atcaaataaa 
ctttaaaaat 
aactatggtg 
aactttttct 
atttcactgg 
gcaagctcct 
ttgagcaata 
gagttggcag 
agtagtttag 
taattgttta 
ctaaaagaat 
tgtatctttc 
ctggaaaatt 
tatgcagtca 
ttggttttgt 
tgacaatcat 



tgagtagctt 
tatgtaactg 
agagtcccgc 
tctgcctccc 
gcacgttgcc 
caaataatct 
cctggcctgt 
aaaatataca 
tgtcatctgt 
gtaggacttt 
agtaatcaca 
acatggtgtc 
tggtccttcc 
agatagataa 
aactcatctt 
taagtaaatt 
atttttgaga 
attcagttat 
gttttattga 
tatacataca 
aacaaagagc 
cagtattata 
tttttattta 
taagttttag 
gttggtgtgc 
cctcccccct 
catgtgttct 
gtccttgcga 
cctacaaagg 
tgccacattt 
gctattgtga 
ttataatcct 
tacttttaaa 
aactcaattt 
aaatctacat 
accaccattt 
ttgattcatt 
catgtctagc 
cttctatttt 
gtcctggtct 
tagttccagc 
tctgcttcct 
tcctttcctc 
tcaccccagt 
atattaaaat 
tggctagttt 
aatgttgtat 
tgttaagccc 
actctaggga 
agttaacatt 
gtgtgtggag 
acttttctat 
ctgatgtata 
atttttacag 
caaagactgc 
tggcatataa 
tttactgttc 
ctcacattca 
tctgtgggtt 
tgttcactcc 
gtgtacatga 
atattcttag 
aggtttctgt 



atgattttat 
tatgagtgtt 
tctgtcaccc 
aggttcaacg 
atcacaccgg 
gcccacctca 
atgagtggtt 
cttaatttcc 
gaataacaca 
cagtaccata 
aacacacaag 
atttgtatag 
agcagaagca 
tataaaactt 
tgagttagta 
aatgtcacct 
ctttgctctt 
tattgagcca 
aatgttattg 
cacatactat 
caaattgtca 
gtcctatatg 
tttatttatt 
ggtacatgtg 
tgcactcatt 
ccccccactc 
cattgttcaa 
tagtttgctg 
acaagaactc 
tcttaatcca 
atagtgcccc 
ttgggtatat 
ttagtgcaag 
gaggaaaaaa 
tttgattttg 
gtgatctaag 
tgtgctaacc 
ttcccttatc 
cagccctatt 
cagacctgtc 
ataaatccag 
tctactgatt 
aagtctttca 
gtctatagac 
tacaaaactt 
ctctgtttgt 
agggatttca 
ttgaggaaca 
atatagctaa 
aacaggtagg 
attagtatcc 
tcatatgtta 
tgtgagagct 
tcatagctta 
ttttaacaat 
atggagacaa 
aaatgttgca 
ttgttcttca 
taaccattat 
aataacccca 
ataaacaggc 
ccatatatca 
ttgtttgttt 
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gtgatatttt 
tttgggtttt 
agattggagt 
gattctcctg 
tttcactatg 
ggctcccaaa 
tttgtgtatt 
catttgtata 
gatttgtttc 
ttgaaccgac 
taaaatctta 
ctaaatccat 
aagtgacatc 
tctgaaggct 
taatatacat 
tctaatgtgt 
cctcatcaaa 
gaatcatttg 
catattaaca 
agggcatatt 
attagtttta 
acccttcatt 
tatttatttt 
cacaatgtgc 
aactcgtcat 
cacaacagtc 
ttccaaccta 
agaatttgct 
ttcatttttt 
gtctatcgtt 
aataaacaca 
actcagtaat 
gaatcaaaac 
atagctatgc 
atatatagct 
agaattaggg 
tggaactatc 
atctaacaaa 
cctaagtcaa 
tttgattctt 
aacgtccgtc 
atgttgtcac 
gctgccacac 
aaaactggag 
atctatctat 
gctaaagctc 
tcttcaaagg 
aatgatatta 
gaaatatata 
tcatttgcta 
aatcaataga 
acatttcatt 
aaataagcaa 
ttttcatgaa 
gttctgtgat 
aaaaacttca 
ttctcagaaa 
cagacccagc 
ttctcatatt 
ggaattgtaa 
aggtaaaatt 
agcactaagt 
gtttgtttgt 



taagttttat 
tttattttgt 
gaagtggtat 
cctctgcctc 
ttggctaggc 
gtgctgggat 
gaatttgtaa 
ttttaaaata 
ttttcaattt 
tacaactttt 
agggccaata 
taggtgttcc 
gactaaataa 
ttctatgaaa 
agctaaatct 
ggacatctga 
aataaccaat 
tagtcagaag 
tatgtaagaa 
gccttcaaat 
tgcattaagg 
ctatagaaat 
ttattttatt 
a99tttgtta 
ttagcattag 
cctggagcgt 
tgagtgagaa 
gaaaattgag 
atggctgcat 
gttggacatt 
cgtgtgcatg 
gggatggctg 
catacagata 
tcaaatctct 
atattgtgtt 
atgcctcttc 
agtgcactgg 
tgcagtgtaa 
caattttatg 
agcttgttaa 
ctccaaagga 
tctgctgctc 
ctttctaccc 
agttctgatg 
attttccttc 
tgatgaacac 
agaatctagc 
taattggttc 
tcttgccacc 
catttgttca 
ggaaacccat 
gactctaaaa 
tttctttagg 
gaactcagag 
tctctaacaa 
ctctcttcct 
aggctacttt 
ttttacttga 
ataatgtgag 
tacacacaaa 
tcaaagtgaa 
ttaactaatt 
ttgagacgga 



attctactgt 
tttgttttgt 
aatcttggct 
ccaagtagct 
tagtctcaaa 
tacagttgtg 
ttgtaatcat 
acttttataa 
gtatactttt 
aaataaaaaa 
ggatgtatcc 
actagcagac 
actaatgtgt 
cataaaataa 
gtattcccta 
tcatgtgatc 
ccattacttt 
agtcattttg 
aacaaataaa 
tgaaagctgc 
gatgaggaaa 
atttaagaaa 
ttattattat 
catatgtata 
gtatatctcc 
gatgttcccc 
catgtggtgt 
tttccagttt 
agtattccat 
taggttggtt 
tgtctttata 
ggtcaaaaga 
aaaactgaga 
caaatttcat 
gactgatttc 
tgtcttggtc 
aaaatacaga 
acaagtttta 
actcatcaat 
tgactttcaa 
cctgggccct 
tccacatcct 
ctcaccctca 
aaaaccttct 
ttccatatta 
atgtaacaaa 
tgttcaaact 
tttagaatta 
ttgctttaca 
ctaaaaatac 
attttcatca 
tcaattatat 
gtgcaaattt 
agagtggctt 
tattttaata 
tcaggtgcta 
gaatagtcat 
aattataata 
ctccatgaaa 
aaaatcaagc 
aaaacagttc 
taatgttcac 
gtctcgctct 
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gtcgcccaga 
ttcacgccat 
gcctggctac 
tcttgatctc 
gcgtaagcca 
tttttcttct 
tggacttcct 
ttgttttttt 
aagacaagta 
atagagaatg 
catataaaga 
gcctagatga 
tcagagagaa 
ccccaaacca 
tagagaagat 
tgtgcatgat 
ttggtcacct 
acaatttcct 
tcttcatcgc 
atgtagccat 
ctctggtcac 
cctttcactt 
ctcttatcat 
caggctttac 
cagcgatctt 
cccacctgac 
catcagagaa 
caatgctgaa 
aaatgattag 
ccctaagtgc 
cagtatgggc 
gtttttaaaa 
atgaatataa 

gggggtaaac 

tattactatc 
aaaaaatgat 
ataggtggat 
gtctctccta 
ctcaggaggc 
agatcacgcc 
aataaataaa 
tgtggtgtct 
atgtaaatat 
tgtaaacatc 
ttctgaaaat 
atttctatca 
gacttaggaa 
aaatgacttt 
tctacagagg 
aatgctactc 
agaaagcaag 
cagactaaag 
tgtgttgatt 
tgataaaagc 
gcgtgttctt 
gtcatgaccg 
gtcaccctgg 
taggggcgta 
tggcacatta 
aatcacaaag 
gactgcatat 
taatattttg 
tatgatatca 



ctggagtgcc 
tctcctgcct 
ttttttgtat 
ctgacctcgt 
ccgcgcccgg 
gttttgtatc 
acatttattt 
tgttccctca 
taaatgcatt 
tgcctgtgtg 
gaaaacgttg 
atagatgaat 
tcctcacatt 
caccatagtg 
cctgtttggg 
cctgctgatc 
ctccttttta 
ctcagaacag 
cctagtgatc 
ttgcagccct 
tgtgccttac 
atccttctgt 
gctggcctgc 
tctctcaagc 
caggatccgt 
aatagtcact 
gtctgtagag 
cccattgatc 
gggaaaatcc 
ctgtggggta 
tcttagtaac 
ataaaaagct 
attgtttggt 
ttttaactaa 
catagagcca 
gccgggcatg 
cacctgaggt 
aaattacaaa 
tgaggctgga 
attgtactta 
taaataaata 
tgaacaatag 
atgaatcacc 
gtgagaagct 
caactctcaa 
tcaaaatact 
taatttagta 
gttagtggaa 
aaaaatttca 
caaagagatc 
tggaggaatg 
taagttgtgg 
taaagaaaac 
atatactctt 
gtaggtattt 
gccaggactg 
atctcctaga 
cgtatctttt 
tttttttaca 
aggctaaata 
taatacatac 
cactctgctt 
actactgtat 



ctggtgtgat 
cagcctcccg 
tttgagtgga 
gatctgcccg 
ccgacagtca 
tgaaataata 
atttaaatga 
caaaccttga 
aatgtataaa 
tgtgtgggtt 
tgcagaaatt 
gagtgaatga 
ctttgtcact 
acagaattca 
gtgttcctgg 
aggaccaatt 
gacatttgct 
aagaccatct 
actgagtttt 
ttacattaca 
atgtatggct 
ggctcccttg 
tctgacaccc 
tctctcttca 
tctgctgaag 
ttgttttatg 
gagtccaaaa 
tatagcctac 
ttttgtaaaa 
acaaactgaa 
cactttagtt 
taatgttgaa 
ttctaataat 
cttagcctga 
tagtgaggct 
gtggctcaca 
caggagtttg 
attagctggt 
gaatagcttg 
ctccacttgg 
aaaagaaaac 
ttctgaaacc 
tataagttaa 
tcttattggt 
cacccttttt 
tcaattgcat 
ggaatcaata 
aaaggggtct 
agataagctg 
atttattaaa 
caccctcttt 
ctacatgtgg 
gatccttgat 
atgagaattg 
ttaggctgtt 
tgccttgtta 
ctcctgcttc 
tggaggaaat 
ttagggtatt 
aagatacaca 
atatttagcc 
ttaactaaca 
ctactataaa 
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ctcggctcac 
gtagcttgga 
gatggggttt 
cctcggcctc 
taggttttaa 
agatatcaga 
tgctaatatt 
aaataattac 
tttataaatg 
tgagatacaa 
ctgatgtgtt 
taaatggatg 
ttcagttttc 
ttctcttagg 
cgatctacct 
cccaactgca 
attcttccaa 
cctacgctgg 
acttccttgc 
gttccaggat 
tccttaatgg 
aaatcaatca 
gtgtcaaaaa 
tcattcttct 
gcaggcacaa 
gaaccctctt 
taattgcagt 
ggaacagaga 
ttgcagttta 
atggaaaaac 
tcttctcaaa 
attaataatg 
tctgtatgaa 
gagaaactct 
ccagataggg 
tctgtaatcc 
agaccagctt 
tgtggtggca 
aacctgggag 
gcgacaagag 
gaaaaaagaa 
tgtgaactta 
attttctgta 
tcatgtggta 
ttttttttga 
tccacgtatg 
gggtttatca 
tgatccagat 
caaagtgcag 
ggctattgca 
aagttttttt 
gtaggctgac 
attttagtgt 
ggacacctag 
tcctaaacta 
acctcaagac 
tttaacaccc 
agtgtttatt 
ctagttctca 
ttaatcaatg 
ctattaaatt 
ctatacagaa 
atgttcaagt 



tgcaagctcc 
ctacaggcac 
cacagtgtta 
ccaaagtgct 
gtgaaaatgc 
catggaggga 
aatatctgat 
caaatgtaga 
tatatatata 
aaagagacag 
tattgaaaaa 
aaacaaatgc 
aagaaataag 
actgacagac 
aatcacactg 
aacacccatg 
tgttactcca 
atgcttcaca 
ttcaatggca 
gtccaagaac 
gctctctcag 
tttctactgc 
gatggcaatg 
gtcctatctt 
agccttttct 
ctgcatgtac 
cttttatact 
tgtaatcctt 
ggcctgtgtt 
ctagtgtagt 
attaacactt 
ttttatttgt 
aatactgggc 
gaaaatccaa 
tggctaattt 
cagcactttg 
ggccaacatg 
catgcctgta 
gcggaggttg 
cgaaactcca 
aaaaattgtt 
ttgtaatcac 
agtattctca 
ataccagtta 
gttttcaagt 
ttttccagtg 
attgatattt 
cccagcaggg 
tgagaagaga 
ttacagagta 
aatggtcttt 
ggcatgacaa 
gtgcataact 
gttctcttgc 
taagcatctt 
agagttgatt 
tgaaacagta 
ttgcttcaca 
aagctactac 
tattatttag 
gcctcttatg 
aatgctacag 
ggttttccag 



gcctcccggg 

ccgccactac 

gccaggatgg 

aggattacag 

taccttaatt 

aactgaagat 

attcatttct 

attcaccaag 

agagacagaa 

aaaacacaca 

agtaattggt 

caaatctgga 

aagatgttgt 

gacccagtgc 

gcaggcaacc 

tatttcttcc 

aatatgctgc 

cagtgtcttc 

ttggatcgct 

atttgcatct 

acactgctga 

gctgatcctc 

tttgtagttg 

ttcatttttg 

acgtgtgctt 

gtaaggcctc 

tttttgagcc 

gccatacaac 

tatttgtaat 

tattatttaa 

tgaagattta 

cagagattct 

ttacttactt 

cccctttgaa 

tacattttta 

ggagccagag 

gtgaaacccc 

atcccagcta 

cagtgagtca 

tctccaaaaa 

tctacttcaa 

tgtgcaaaaa 

aaacccacac 

tggtcctcaa 

gcatggatcc 

aagtacctct 

gcatatgcat 

ggttcttgga 

tagttcatta 

aggtgttccc 

tatctacgta 

aatttatcat 

attattatca 

tgcattatta 

atgaacatgg 

ttaaaatgtt 

taaaaattaa 

caactgtctg 

agggaaaatc 

tcaaccagat 

tttgacattt 

aaaatgctta 

aatattaaac 
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agatgactaa 

gtctggttct 

tgggcagcaa 

caatggcagt 

tgggaactga 

tgggatattt 

ttcttttatc 

tggtctgata 

tgcctaaata 

gcaaccaaac 

ttttttttaa 

cccttgggta 

gaatagaaat 

ttttctttac 

tggagtctta 

cacaaaagca 

attaagccat 

taagagtgtt 

acagctaatg 

ttaaaattgt 

gcttcttagg 

ttaattaatt 

gacaaggaat 

acatgttact 

cagattttca 

atttttaatg 

aatgtagaaa 

taatcaacta 

atcttgcaaa 

ctgtacgtct 

aaatgttttg 

gaaagcttgt 

atcagcagta 

tagaaacaaa 

aaaggctaca 

gtatttttat 

cacaattgaa 

gaatatcatg 

ttgtgccctc 

gatttgtgat 

tttttttgtc 

tattttaatt 

agatactttt 

aaactgtatg 

taaagcaatt 

gcttaaggat 

gtgtctggta 

acactgttat 

acattttttt 

tcccagcaat 

cttcaaacac 

atcagagaat 

agcaaattgc 

gcaatgtatt 

aaaaataatt 

ttgattaaaa 

tgtaagatat 

acattgattg 

gtacaaaagt 

caacctaata 

ttcaatttta 

tttttatttt 

cttggcatgg 



cagtttttgt 
atccaaatct 
caatttaata 
ggcacaacag 
gtgacacagc 
ctttatctta 
ttcctctttt 
ctctatttga 
actgtatact 
ttaaactttg 
aatatgtatt 
aaagttatct 
agtgacaatt 
ttaaatgaaa 
gacacagaga 
gatttcttgg 
tcttcaacac 
actatagatt 
tgaattttcc 
cttaatatcc 
ctgcatttta 
tccaggtaca 
aagttttgtt 
ataatgcaca 
tttgatacta 
tcatatttta 
accaatgaac 
cctgaaaaga 
tcatgattcc 
ctctctaatt 
aacattttta 
aacaaacctg 
ggagctcttg 
ttctgtttta 
tataataatt 
ggagaaaaat 
taaacaagca 
aattatcctg 
acatttttaa 
gtcaccttta 
tgaattataa 
tctggtgaag 
agctatccaa 
tatagttgta 
aaaaaataag 
gaaatatttt 
agtgaggaga 
acaaatactt 
tcaacaaata 
tacccccaaa 
tgaacaaaaa 
aatatatctg 
tttaatttga 
tgccacaaac 
tgaagcacta 
tacaaaaatg 
tgtttgatat 
attaagggta 
aatcttgttt 
ttattatcat 
agaatctact 
ataatcaaga 
taccatgtaa 



aaaattgggt 
gcttttagaa 
tttacctttt 
aagcaggtga 
gtccttactt 
tgacaacatc 
ggttaggcaa 
aaactgtatt 
tccttttatt 
acacctttgt 
tatttattct 
cagaaattct 
gatgtaatca 
atgaagtagt 
gatgaaatga 
ttttaaattt 
acttcaaatg 
aaatagtggt 
ctgacagtca 
atgtgaaggg 
ttaatatatt 
gacaaatata 
tatattttgt 
aaacaaatat 
taatgtatgc 
attagcttgt 
catattggca 
gttgctctca 
atcatatcaa 
gttccttttg 
aaggtcttgc 
taccactaat 
tttatggtta 
atttgcacag 
ttctttcatg 
tatattttgt 
cttatatggg 
gatataattt 
ccaacttcct 
ttatacctta 
atatatttta 
atatttttgt 
atttttaact 
taattttgaa 
gaaaattatt 
gaatgagttg 
aaacacctaa 
atgttttcac 
tctctaaata 
agaactgaga 
aggattaatc 
tgcttcataa 
agataaaact 
aagataaatt 
aagaatcaat 
ctaaaattaa 
atctacattt 
taaagttttt 
ttttgccatt 
attaaggaga 
gaaggcataa 
caaagtaaac 
cagaaacctt 
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tatattctta 
acagtaactg 
agcactattt 
tctgtagaga 
ccttttagag 
atttaatctg 
agcattgtgt 
ttagaattgg 
ttctagttat 
aacccaaatg 
atttgtaatt 
ttggagatag 
tagtgtattg 
gagagaatta 
cttgcctaac 
tctctctttc 
ctggggggaa 
tattatgtga 
ggctaatgtg 
agaaaatttt 
ttcccaatta 
aaaaoataaa 
tatatttctt 
aaagttactc 
acctttacat 
gtgatcctaa 
tacataagaa 
gcattcacat 
tgtatcttag 
ttatgttccc 
taaacatttc 
gtttgaaaac 
tatgctactt 
attagaaatt 
ttatgatggt 
ttattgttgt 
gttattggct 
tgttgcatgt 
ccaagtgtta 
agtcaatcta 
aactaaattc 
cttcaatttt 
cagaaattat 
tgtagttgta 
ttactgtcta 
tttgttttcc 
ccttggtata 
cataccagtg 
cagtctcttt 
gggccacata 
agaaagaaaa 
cagtgggaaa 
atttaaaatt 
ggtaaaatta 
atataaatat 
cacacaaaat 
aaaagagctg 
ctgaaaataa 
aaaagtaaca 
caaaattttc 
gaaacttgtg 
acatgatgta 
tccatactgg 



agtgtcagtt 
aaaaagaaag 
gtgtatttgc 
caagagtgaa 
gcccagcatc 
tgggatagct 
tcaatttcta 
tacaaacaca 
gagaattgct 
cttgacctgt 
taatttttta 
aatagaacat 
ctaattttag 
ttatcttcat 
gtcatttggc 
taattcatac 
tattggagac 
tagtgagatg 
atcatttata 
gatatatggg 
taaaagcaca 
atttattaag 
ctcaataaat 
accatataat 
agattaaata 
tacaatatta 
atagaagcaa 
gtgttttatt 
tattttgatg 
agaagtgtaa 
tatggataaa 
aatcatttta 
tgatctctca 
gaagaatata 
gtcttacatg 
tttgtaactt 
ttcattgcaa 
tttaatgttg 
ttaagcactg 
tttctgccca 
ttatcattaa 
ctcttaggag 
aatattatga 
catacaatac 
tggtggccat 
agaaagtcag 
ttaattctat 
tatggtgaaa 
ttcttacaaa 
tgtttttcca 
taaatattta 
aatttgtagt 
aagaggcacc 
ttgaaataca 
caaagactat 
attagcaacc 
atatagattg 
atttaatatt 
gcacttattt 
ttatgctttt 
tgaagagttt 
ttaaatacaa 
aaacgtgtct 



ggttgaataa 
aaggcaattt 
ctggttttgg 
gcagaagagc 
atttcatctg 
ctttatatta 
gcaagtagag 
ctctatattt 
gtggcttgca 
ttgtaatctc 
attgtgcaag 
aaatggacat 
gattactgtt 
ttacaatatt 
cactcagtag 
tgaagtggtt 
tatgaattaa 
tatcctaaca 
tgcatgataa 
aaacctttat 
catatttata 
gacctgctat 
gtatatgact 
aatatgaata 
ttttatgaac 
atacatgtca 
tgcttacttg 
ccatttgatt 
accgacgtat 
ttgctgaatt 
tttttatcca 
ctataactct 
aaggaagttt 
tttaacaaat 
aagaatattt 
tttcttatgc 
tattttatgg 
taaattactt 
ctttcccact 
tttccactgt 
attgaacata 
attgaaaatc 
ttcaatttag 
ttatacatta 
agggagaact 
gaagatgctg 
ggatttatta 
tcatcatgta 
gaggtttttt 
ctatgagtca 
aatttaaata 
aaaaagagta 
ctacaatttg 
agaaattaat 
aagtaaaaaa 
aaaaatatga 
gcactttcat 
taataggttg 
tttttcttac 
attcagctat 
catggtgttg 
atagttcctg 
tagtttttgt 
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ggttcttagt tcctacaaaa tcacaaagtg agaactgcct gcatttaata actggtaaaa 52560 

tacaggcatc caccagatgg gatagaagct gtctctaact aagtagaatg tcattagcct 52620 

tgatcatgaa ggaaagtgta aagtacaggt gaaacgtttt ttgtaataga aagttataca 52680 

gccattaaag tgaggatgtt cctagttaag aaaaaggtaa ttcattttat gtaatatatt 52740 

atgtgcaaag atatatccta tgctgtatac agtgtgatat cttgtaaaat attagggata 52800 

aggggaaata aaacttgtta ttaatagcta tgtccatatt taaatagtca ttgctattgt 52860 

gagaaaataa ggcaacaaaa actgtgtatt ttcagtttag tccagatctc aggatgtata 52920 

attacaggat acttatttat taatattgga atttaaatat gaaagtaagg aaaagcaagg 52980 

aaaaagcaag gtaagaaaaa gaagaggtcc ccaattctgg tttccagcac agaatgacca 5304 0 

atgaacaact tctcagagaa aaatgcccat gtgaaaattc cgtaacccat gggtgaggtg 53100 

gggcacactc ttggagcata aaaactggaa aaaaccacat tagaattgtt aagaggagca 53160 

gtttcacttg cttctttccc agccaaacac agtgtcacac tgaaagagat tccctgggcc 5322 0 

tgtggtttcc ccagtgaaga aaagagagcc cacggcagac atccagcttc cctatgttcc 532 8 0 

agagaacttc ccaagaggcc cattctgtct aaccttgtgg gtaataacgg aagacttggc 5334 0 

aaagcttggc caccccagtc agctcataac aaagttaagg ggtgtagctt acggcaacca 53400 

gcacccgatc ttggtggtgg catcatgttc tagctggcag ctttatgtga ccagagagcc 534 60 

cagccactag cattgctcac atgtggagct gagttggcag gccctttcgg tcaggaggca 53520 

tgttgaacag ttctgtctgg ctgtggtacc agtcagtaag ctggaccagc tgcagagcac 53580 

agcctagagc accacccaac cacaaagcac aacttgcagc ccaccccact caggtagaga 5364 0 

tccaagccag caaccccact caaccgctgg gcatagcctg cagtcccatc tgacgacgga 53700 

gtctggtcaa ttatctctcc aaactacaga gcacatccag taaaacttcc caattatgga 53760 

tcagcttgca gccccaaaaa actgcaggaa atagccagtg gccctatctg gccaagagat 5382 0 

ctggtcagtg atctcaccta aatttggagc aaaggcagtg accccatcaa tctgtggacc 53880 

acagccagca gccccactcg aatacatacc acagtgaatg gtcttacctg attatggagc 5394 0 

ccagtcagca gccctgagtg attttggatc ctagccagca tactcaccca cctcagagca 54000 

caggcagcag tcagaggtca tgtgactaga gtcagctttc aaattcatca agtcctggtt 54060 

ccacctctta ctccctatga aatctcaggc aaattattta acctctactc ctatgcccca 54120 

atttgtttgt ttgtttgttt tcacagtact tatcatcacc tgccatatat ttgtttgtgg 54180 

gtgtatttgt catctatctc ctccagcaaa aaacattatc aataaagtag tttttaattt 5424 0 

tttttgagca acagctatat tgagttatat agtggctata ctaatttaca tttttgccaa 54300 

taatgtacga gcattccttt ttctgtgtat ccttgcaaac ctctgttcat ttttttgtct 543 60 

ttttaatgac aatcattaca aattggttga ggttatactt cattatggtc ttgatttgca 54420 

tttcatttat gattagtgaa ttaaaccttt ttttcatata cctgctgaca atttgaatat 544 80 

cttctttttg aagcttttga taaaatccaa catctcttca taataataat aataaaaaaa 54540 

tactcaacaa actaggcatt gaagagacat acctcaaaat aataagagcc atctatggca 54600 

aacccacagc caacatcata ctaaatggga aaaagctgaa agcattccac ctaataaatg 54660 

gaacaagaca aagatgtcct ctctcatcat tcctatttca catagtactg caagttctag 54720 

ccagagcaat aaagccagag aaagaaatat aaggcatcca aataggaaaa gaagtcaagc 54780 

tatctctctt cactgataat atgattttat acctagaaaa ccctaaagac tacaccaaaa 54840 

agctcctaga tctgataaac aactttatta aagtttcagg atacaaaatc aatgtataaa 54900 

aatattagca tctctatact cccataacat tcaagctgag agctaaatca agaatgtaat 54 960 

cccatttaca atagccacac gcacaaaaat aaaacaccta ggaaaacaac aaaccaagga 55020 

ggtgaaagat ctctacaagg agaactacaa aacactactg gaagaaatca tagatgatat 55080 

aacaaatgga aaatcatccc atgtttatga attagaagaa tcagaatgct aaaacagcta 55140 

tattgcctaa agcaatctac agattcaaca ctattgccat caagctaatg acactattta 55200 

catagaattt gagaaaactt gcctaaaatt tatatggagc caaaaaagag actgaatagt 55260 

caaagcaatc ctaagcaaaa agaacaatac tggcagcatc aaataaccca acttcgaact 55320 

atactacaag gctacagtaa ccaaaaaagc atggtactgg tacaaagaaa gacccataga 55380 

caaaatgaaa cagtgaaccc agaaataaag tcagacatct acaactgtct gatctttaat 5544 0 

aagttgatag taataaacaa tagggaaaga actacctatt caataaatgg agccaggata 55500 

actatttagc catgtgcaga agagtgaaac tggacccata cctgtcacca catataaaaa 55560 

ttaactcaag atggattaaa gacttaaata taaggcctaa agctataaaa atcctggaag 55620 

aaaactgagg aaataccact ccggacattg gcctcagcaa aaaatttatg aataagtctc 55680 

cagaagcaat tgcaataaaa ataaaaattg aaaaatggga tgtaattaaa ctaaagagct 55740 

actgcataga aaataaatta acagagtaag caatctaaag aatgaaagaa aatgttcaaa 55800 

aatgataaat ccaacaaaaa tctaatatcc agaatatata agaaacataa acaattcaac 55860 

aggcaagaaa cctcccccta aaaccattaa aacacatgga cacagggagg ggaacatcac 55920 

acactggggc cttttgtggg gtgggaggct aggtaaggga taccattagg agaaatacct 55980 

aatgtagata acggattgat gggtgcagca agccaccatg gcatgtgtat acctgtgtaa 56040 

caaacctgca cgttctgcac atgtacccca gatcttaaag tataataaaa aataggcaaa 56100 

agacataaac agacacttct caaaatgaag catacatgtg gccaaaaaat atattaaaaa 56160 

tacagtatca ttaatcatca gagaaatgta tatcacaact acaatgagat aacatctcac 56220 

accggtcaga atggctatta ttaaaaagtg aaataaataa cagacattgg tgaggttgtg 56280 
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gagaaaatgg 

agcagtttgg 
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aaaaaggaac 

cagtatggca 

tactaggtat 

tgttcatttt 

tgacagtttg 

agaagaagaa 

acaaactaat 

aaatgattag 

ggggagggtg 

tctggtgatg 

accttcacat 

ggtactatgc 

aatttaccta 

taaaatacaa 

caatcagatt 

cctgaatatt 

ttgtcttttc 

aatcctgttt 

cttfcacctag 

tttcaggtca 

atagtctttt 

ctgatttgga 

tgtatccttt 

tttcattcag 

ttgttgtttt 

tttttgaggg 

cagagatttc 
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gtggagtact 
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aatccttata 

aaatttctca 

ataaacccaa 

agcactatac 

gataaagaaa 

atgtcttttg 

gaacagaaaa 

catgaaaaat 

gtgcagactg 

cacatctatg 

tccctattta 

gacccccttc 

aaaaccccaa 

aatgggcaat 

attgacaagt 

agagtaaaaa 

ggtctaatat 

ttaaaagtgg 

aggcagaggc 

aaaaccctgt 

tcccagctac 

agtgagccga 

taaataaata 

acttctcaaa 

gattattaga 

gactattatt 

atttttatac 

attcctcaaa 

atacccacag 

ggtactattc 

gataaagaaa 

gatcgtgtct 

gctgaagcag 

aacttacaaa 

ggaggaggga 

aaataatctg 

gtatccccaa 

tcactacctg 

tgtaacaaac 

caaaaaagaa 

tttaaaaatt 

agtcccttgt 

actctataga 

atctatattt 

accaatgtct 

tacatttgtc 

atatgtctga 

tcttctttct 

caaataaact 

ttttactctg 

atagttcctc 

aagtgtttag 

ggtaagttgt 

gttgtttgcc 

ttgagagatc 

catgatttta 

ccagtatgtt 

ccgcagatgt 

ttttcagacc 

tgtggctttt 
caatgtgggg 
tcattaggca 
tgacataaca 



cacttttggt 

aagaacttaa 

aggaaagtaa 

actgtagcaa 

atatgataca 

cagcaacata 

ccaaatgtgc 

aaaggagaac 

aaaaatgatc 

accatctgat 

ataaatggtg 

cctataccac 

actgtgaaaa 

gatttcatga 

aggatctaat 

gacagcctat 

tcagcatgtt 

gctaagatgc 

aggtggatta 

ttctactaaa 

tctagaggct 

gattgcacca 

aataaataaa 

agaagacata 

gaaatgcaaa 

aaaaagtaaa 

tgttagtggg 

gagctaaaag 

aattataaat 

acaatagcaa 

atgtggtgca 

tttgcaggaa 

aaaaccaagt 

cacaaagaag 

taggagcaga 

tgcaacaaac 

aactaaaata 

ggtgatggga 

ttgcacatat 

aaaagatgaa 

tatttatttt 

cagatgaatg 

ttacttcttt 

ggttttgttg 

aaaagtattt 

tttaaaacat 

ggaattgatt 

tattttcttt 

tttggtttta 

atttagttat 
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gtctctgttt 
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ttggtattga 

tttcttaatt 

ccaggtgcag 

ctattaggtc 

ttgatgacct 

taggtctttt 

tgtgtatatg 

gtgccctttt 

atagcaaccc 
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gggaatgtaa 
aacacaagta 
ataattttac 
agacatggaa 
tacacactat 
gatgcagctg 
ccatgttctt 
aagcaacact 
agaccaatgg 
cttcaacaaa 
ctgggataac 
acacaacaat 
ccctaggaga 
caaagacaca 
taaacttaag 
ggaatagaag 
taagaaactt 
tggcaacagt 
tctgaggtca 
aatacaaaaa 
gaggctggag 
ctgcactcca 
taaataaaat 
catgcagcca 
tcaaaaccac 
aaaataacag 
agtgtaaact 
cagaactacc 
cattctacca 
aggcatggaa 
tacacaccat 
catggatgga 
accgtatgtt 
gaaacaacag 
aaagataatt 
tcccatcaca 
aaagtttttt 
tcatttgtat 
acccctgaaa 
aaagagatat 
gaaattgagt 
atttgtaagt 
tgctgtgcag 
cctgtgcttt 
tcccagagtt 
tttgattaga 
gtaatgtcac 
gttaatctag 
ttggtttttg 
ttattttctt 
gttagatcat 
tttcctttta 
ttatttattt 
tcagtagcaa 
tttttatttt 
gattgacact 
aggacaagaa 
caattaatca 
aaggctgtaa 
tgtcagtcta 
tgtaggatag 
tgtcattttt 
ctgctgtctt 



attagttcat 
ccatttgacc 
caaaaccaca 
tcaacgtaga 
tgaatactat 
aaggccatta 
attaaaggta 

gggggttaat 

aacagaatag 
gctgacaaaa 
tggctagcca 
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agattatttt 

atggatggat 

aaatttcttg 
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ctctgctttc 

aagccaggct 

ttcatctctc 

tctctttgtt 

tccatctgag 

accaccactc 

agccctccaa 

ttgggtatct 

cacatcactg 

attcaccgtt 

tcatggtgga 

ggggaagtgc 

ctaggggaat 

gtcccacctc 

ccaaaccata 

tctttgtctt 

ttgaatttga 

ctgagaaatt 

tccatttgga 

ataggcctgc 



tcatcctttt 

cttatttttt 

aagacagcac 

ttttccattg 

agagtattat 

tttctaaatc 

ttctttagaa 

ttttgtgggt 

gataatgact 

atgccctaca 

tctaaactgt 

acaagtggtt 

acgagggaat 

gactgaagaa 

agcttttgtt 

ttaatgcaca 

accatttcct 

ctgtattact 

gtgaagaggt 

ttcaggaaac 

gacagaggag 

agaactctat 

tgatccattc 

aggcttaggt 

catgttcttc 

actcattcca 

aatctcatct 

tctaagatac 

tggccaaaat 

aaatcttaaa 

atgcaagggt 

agccctgttg 

gcacaagctg 

agatccatta 

ctctgcattg 

ctggatatcc 

attcatgtct 

cttgcaccct 

gttggagtgg 

gggtctggcc 

tgtaataaag 

tgtagctcct 

tttttctttt 

cttttaaata 

aattcttgaa 

tcaaattcaa 

aaagcatagc 

accacctcag 

aacaagtctc 

actgttccaa 

ataggaatgc 

taaagaacta 

ccttaggctg 

agggcaaagg 

tacacacttt 

ggtgctaaac 

cagcagtagg 

tcagtctcca 

tgatttttcc 

ttggtgagca 

ttcagttatt 

tatttcaatt 

tttacttttt 



actttgagcc 

atctaacttg 

atacttgagt 

agcagtcact 

tgttaggtaa 

ctttatttct 

gtatgttttg 

accgtgaggt 

taattttgat 

cacattttga 

agttattgat 

tatttaccac 

tttatacatt 

ctctctttag 

tttctgcgaa 

taatattctt 

cttgacctgc 

ctgttctcac 

ttaattgaca 

ttacaatcat 

agagggtaaa 

cataagatag 

acctcccacc 

gggaacacag 

tcacatttca 

gcacaaactt 

gagacaaggt 

aagggagtta 

aaaggagcta 

gctccaaaat 

tgctctctca 

gctgctttca 

tgagtggagc 

cgcaggaccc 

cactagtaga 

aggcatttgc 

tctgcacacc 

ctgaagcaat 

ctaggacgca 

cccgacatca 

gtctctgaaa 

ctttatttat 

ctaccacatg 

taaattccaa 

tgctttgctg 

agttccacag 

aagagttacc 

cctggacttc 

taggaagttc 

cctctgcttg 

cccacttctc 

actgagattg 

tataggtagc 

gaaaacaagc 

taaataacag 

cattagaaat 

agacggcaaa 

ttaaatgtga 

taatttgatt 

atggtctttt 

atttcaataa 

atgcaaaagt 

tttttttcag 
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tatgtgagat 
ctactctagt 
attgattttt 
cagcgtcttt 
gagtttacta 
tgtgttctat 
attccatgct 
ttacaaaaaa 
tgcaaataaa 
cttttggtgt 
ggttttaata 
aattatagta 
ccaatgtttt 
catttcttga 
aaaaaaatct 
ccttgaaaga 
tagcttctgt 
attgctaaac 
cacagttcca 
gatggaaggt 

ggggaaagtg 

cactagggga 

aggccccacc 

agtcaaacca 

aaacgcaatc 

aaatgtccaa 

ccacaggtaa 

caggcagtgg 

catatgccat 

aatctccttt 

aggcctgggg 

caggctgatt 

taccattctg 

agtgggaact 

ggctctctgt 

atacaacctc 

cacaggccag 

ggcccaagct 

ggtcatcatg 

ttttttccta 

tttcttggag 

gcaaacttct 

ttcaggctgc 

tttcagacca 

cttagaaatt 

ctctctagag 

tttactccac 

actctccata 

caaactttct 

ttacccaatt 

tggtaccaat 

agtaatttat 

atggttgaga 

acatggcaag 

atattgtgag 

cacccccatg 

tttagatgaa 

tatttttctc 

acagtgtaac 

ctacctagat 

gtatagtttc 

taatgtactt 

ctcctctgag 



agatctcttg 
atatgcctgt 
gtttgtttgt 
taattgtaga 
ctgccatttt 
cttactgtat 
atgtttttgt 
aatctatagt 
agcatcaaaa 
ttcaatgtac 
gttttgtatt 
tgacagtatt 
cctgttacat 
aaaacaggtc 
ttatttcacc 
ttttttctct 
gagaaatctg 
aaaattacct 
caggctgtac 
gaaggggaag 
ctacacactt 
atggtattaa 
tccaaaactg 
tatcattctg 
atgtcttccc 
ttcctcatgt 
gcctgtaaaa 
ataaatgctt 
gcaagtccaa 
gactccatgt 
caactccacc 
ttgagtgctt 
gtttctgaaa 
cggtgtgggg 
gaggattctg 
taaaacttag 
acactatgtg 
gtaccttggc 
ttctgaggct 
tttgccctct 
gcattttccc 
gcagccttga 
aaatttttca 
tctctttgaa 
tcttccacca 
caggggcaca 
ttcctaataa 
ttactattag 
cctcttcctg 
acaaagtcga 
tttatgtatt 
gaagaaaaga 
aggcctcaga 
aggagaaaaa 
aactctaaca 
atccaatcac 
atttgggtgg 
ttgctgattt 
ctgagatatt 
gctattgtgt 
taggcctttt 
gattgtgtcc 
aagttagttt 



aaaattgcag 
ctttataggt 
ttttcttttg 
attttgtcta 
gttatttctt 
tactctgtag 
gtatctgtca 
ataacagttc 
accaactcta 
atgtttttat 
ttattcttca 
ctgattttga 
gttagtaccc 
tggtgttgat 
ttcatgttcg 
tatcatcttg 
ctgaaagcca 
tagcctgagt 
aggaggcatg 
caagcacatc 
ttaaacaaac 
accattagaa 
ggggattaca 
tcccggtccc 
aatagttccc 
ccaaatgtcc 
ctaaaaacaa 
ctgttccaaa 
aatccagaag 
ttcacatcca 
ctcatggctc 
gtggcttttc 
gatagtgtcc 
gctccaaccc 
cccctgcagc 
gcaaaagttc 
gaacttgcca 
actttttagt 
acacagagca 
gagcctgtga 
cactgtcttg 
attcctccct 
aactttcatg 
aatagagaag 
gatacctgaa 
atgccaccag 
gttctacatc 
tattttgttc 
tcttcttctg 
ttccacattt 
agtccattct 
ggtttaatcg 
aaactaacaa 
cagagcgaag 
caaggcaaca 
ttcccaccag 
agacacagag 
gaatattatt 
actgtttgta 
ttctccagaa 
tatctctcat 
cataattttc 
caaatgttct 
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atcttcaagc 

ctataatccc 

gaccagtctg 

gtttgtgatg 

ttgaacccgg 

gcaacaagag 

gaggcaggga 

tggctttgga 

ttccctgaga 

gaagcatttt 

attaaatctg 

ggfctcccatc 

gtgcagtaca 

gtctaaagtc 

ccaggatcca 

ccctcccaca 

ttcaggcata 

tttcagattt 

aaaaaaaaaa 

gattaatatt 

ttattttcct 

ttctttaaat 

ttcttgaact 

aattgttcag 

ttggttttta 

ttccagtttt 

ttttttttta 

tactgcagcc 

gctgggatta 

aggttttacc 

ttggccaccc 

cttaaatgtt 

aatttctctt 

aaatactgtg 

aatcccagtc 

ggtttgtgga 

ctatgacact 

accaggctcc 

cagatgattc 

tactgatgaa 

cctccttagc 

aaagggggta 

ctgcacttct 

ctagttttat 

ctttctgact 

ccttccatgg 

agcataatta 

aagtaacact 

ccacctctca 

aaccttttta 

catttttatg 

cacaaaatgt 

tggggggcat 

gagcaaagaa 

ctcattacat 

attattcctc 

tttagataat 

atcactaatg 

agacattcag 

tgcaaatgag 

cataggttcc 

gatagcttag 

ctagactgct 



ttactgattc 
agcactttgg 
gccaacatgg 
atgggcacct 
ggggcagagt 
cgaaactctg 
gtcagcgtca 
ggtagaatgc 
acctccaaaa 
gggctttgga 
tgatagttta 
ttcaaaggct 
gttggctttc 
cagacgaacc 
ctgagtgtag 
gggctcaacc 
taattaaatt 
gtttttattt 
aagatttcta 
tttaaacatt 
ttttttttag 
atttttttaa 
tctttaagag 
ggtccattgt 
taatcttcgc 
cgcaggtgtt 
agacagagtc 
tctggctccc 
ccggcataca 
atggtggcca 
aaagtgctag 
gacttgttgt 
ctgtcattgt 
ttgggactat 
ttttaattgt 
aaatccggct 
aatcggtctg 
atgcatcagg 
aattctccta 
gattcctaag 
agacataggt 
gtgcaaccgg 
cctctgagtt 
ttttatgggg 
ttaccctcca 
agctcataca 
gtttcctaat 
catcactgct 
catcttgcta 
atcaaatatc 
ttctttattt 
ttgcattttt 
gaatcagttt 
gtaggaaagg 
gcttactata 
ctaaaaactc 
ttgacactgt 
gggattagat 
tatattagag 
cgagatatta 
agagccaccc 
ggaattgctg 
gacaagttgg 



tttcctctgt 

gaggccgaga 

tgaaacccca 

gtaatcccag 

ttgcagtgag 

tctcaaaaaa 

tagcaatggg 

gggcagcctc 

agaaagcagt 

catccagaaa 

ttatggcagc 

tcccctctgc 

gagcagggct 

ttgaacaccc 

agcagagcaa 

agctaacctc 

tgcataaatt 

cctgagtttg 

atgagttttt 

gtttttattt 

ttttctgttt 

attccttatt 

gcttatgctg 

tgagcctctg 

atatttatgt 

ctttgttggt 

ttgctctgtt 

aggttcttcc 

gcactacgcc 

ggctggtctc 

gattacaggt 

tgcttctgct 

ttcctagtgt 

attgctgtgc 

tttggggcca 

aagaattctt 

ttaaatgtgg 

tactgcattc 

gcactcccaa 

ctggagggga 

ctagggaaat 

aaatgaccgt 

ctgttgtatt 

acagtgatgc 

aaagtgtaat 

acaaacaatg 

cccagagtcc 

attccttagt 

taaatatgac 

aaggaaatga 

ttggttaaag 

attattaaaa 

ctcctctgaa 

aggttagcta 

tgcaaatcat 

catttctcat 

ggtggtggta 

tcataaaatt 

taggcccagt 

accaagttca 

agaacttttg 

cttttgggta 

ccattagtgg 
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catcaaaact 

agggtggatc 

tctctactaa 

ctacttgtga 

ccaagatcgt 

agaaaaagga 

tgagaaagat 

caaaacctgg 

cctgccgaca 

tgcaagacaa 

aacagaaaaa 

ttgccgtcca 

tggattgaaa 

cttaattgcc 

cattcctcaa 

tttccaccct 

caaagagcat 

ctaaggaaat 

aagttcagtt 

ctctttccaa 

tcttataatt 

tcctcagaga 

aattcattgt 

ttgatttctt 

cgatgcctgt 

cttagatctt 

acccatgaat 

agcgattctc 

tggctaattt 

gaactcctgg 

gtgagccact 

ggggtaggct 

ggggaagtct 

tgtcattgtt 

ggctggggga 

gctgtactgc 

catctctact 

agtgtcctat 

aggtttttat 

gactgaacat 

tctttgtgag 

ttcttttaca 

tacaatggag 

taggggattt 

cctcaaaatt 

atacatttga 

tttaattaaa 

ccctctgaga 

tgcattttaa 

gtatatatca 

gggaaatata 

atcatttttt 

aaattctcag 

tgaaaagtgt 

ttttctaagc 

tttatagctg 

cgatttgtta 

gttacttttt 

atcatgtaat 

attagaaagt 

tttgcttggg 

tcactggtca 

acaggtatat 



atgggcacgg 
atctgaggtc 
aaaaatacaa 
ggctgaggca 
gccatcgcag 
tagaggaggt 
gaaagatgtg 
aaaaagcaag 
ccttgattct 
aatttgtgtt 
gaatatacct 
gtccacagct 
attcagcatg 
cctcccgtcg 
ggtgtcctca 
ttgaattgtc 
atgctgcaat 
aataaaaaca 
tttgtattct 
tttcttattt 
tcttttttta 
tctttgttct 
cagacattta 
ttggtggtgt 
gcatttgaag 
tagttcttag 
gcagtggtgt 
ctgcctcagc 
ttgtattttt 
cctcaggtga 
gtgcctggcc 
tatagtgacc 
tattttgtgc 
tccaattccg 
ggcttcatga 
caactgtgct 
aattacagtg 
tctttgttct 
gagagaggac 
ccaattccat 
tggcattata 
ctccacaact 
ctcccatctt 
tctattccat 
tcaagatttt 
aattatgaca 
cacaagaggg 
tttgctttaa 
agcttagaga 
gaaccagaat 
ttaatgtatt 
tcttttactt 
tgccttctct 
taactccatt 
actttaaatg 
ggaacactaa 
ctgatgaagc 
gaatactaaa 
ctaatttatt 
tagttggtgg 
agaatctctg 
tttgcatctg 
tgagagttga 



tggctcacgc 
aggagtttga 
agaattagct 
ggagaatccc 
tctagcctgg 
tgggaagagg 
tggccattgc 
aaagtagatt 
agcccagtaa 
gttttaaggg 
gtcaaaactg 
ccacttgcag 
taaggaagaa 
agagcaggga 
cccccaggca 
tctctctccc 
cccattgtaa 
aaacctcctt 
ttatttctgg 
tcttcctgaa 
acatttttct 
gacctgtaat 
atagatcttt 
catatttccc 
agagagacac 
tattcttttt 
gatcttgggt 
ctcccaagta 
agtagagatg 
tccacccgcc 
agtattcact 
actgcaacta 
actgaagctt 
gaaagttatt 
aagaaccttg 
tcctgtattt 
ctgagtagcc 
cagttcacct 
cagagtggat 
ttcccccctt 
ctagcttggg 
ttccttgatt 
tgaatagctt 
catctttctt 
tttttgtatg 
atatgactga 
atgattccag 
gagcaggtct 
agcacctaaa 
taagtccttg 
aactgaatct 
tgcatatcat 
cagaggctgt 
tgtaacaata 
agtaaactat 
cgtaagacag 
tagaatcaaa 
atatttactc 
ttgcaaaaat 
cagagctgag 
agcacacttg 
atcataggta 
tttaagtgtt 
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gcatctgtca 
aatccatctg 
ttccacttag 
ttagaacatt 
cctcatacat 
tcttcttcct 
aaaactacat 
atattttcaa 
tgcttctact 
aaaatcgtat 
ctaaaaatgc 
ccctaacaac 
tatcactgat 
taggaagagc 
catttccatc 
cagtgggtgc 
gcgcaagggg 
gaaaatcagg 
gccagattat 
ctagcacagc 
gccattgccc 
ccaccacagc 
agacaaacaa 
gaagagagca 
gacccctgac 
ctcacacggc 
actaacaaac 
accaaaagta 
taaaaagcag 
aagctggacg 
caagctacgg 
agaagaatgt 
gaaaaccaag 
ctggaagaaa 
tttagagaaa 
aaaagaccaa 
ttggaaaaca 
aacatccaga 
ccaagacaca 
gccagagaga 
tcggcagaaa 
aagaattttc 
ataaaatact 
aaagagctcc 
aatcatacca 
aaataaccag 
aaagtaaatg 
gtcaagaccc 
ggctcaaaat 
gggttgcaat 
aagaaggcca 
atatatatgc 
aaagagactt 
acattagaca 
ctgcaccaag 
atattttttt 
cctctactca 
gcaatcaaac 
ctgaacaacc 
atgttctttg 
aaagcagtgt 
tccaaaattg 
tcaaaagcta 



aggttaacac 
tgtactttac 
ctcctagcac 
tccaatacca 
tctcctttcg 
ctctagtttc 
tagcaaatat 
atattttaag 
cttcatttta 
tttatcctgc 
atatatgtaa 
agactcacta 
gaaataaaaa 
tcgggtctac 
tgaggtactg 
gcacaccctg 
tcagggagtt 
tcactcccac 
atcccacacc 
agtctgagat 
aggcttgctt 
tcaaggaggc 
aaagacagca 
atggttctcc 
ccctgagcag 
cgggtactcc 
agaaaggaca 
gataaaacca 
agcgcctctc 
gagaatgact 
gaggacattc 
ataactagaa 
ctcgagaact 
gggtatcagc 
aaagaataaa 
atctacgtct 
ctctgcagga 
ttcaggaaat 
taattgtcag 
aaggtcgggt 
ctctacaagc 
aacccagaat 
ttacagacaa 
tgaaggaagc 
aaatgtaaag 
ctaacatcat 
gactaaacac 
atcagtgtgc 
aaaaggatgg 
cctagtctct 
ttacataatg 
acccaataca 
agactcccac 
gatcaacgag 
cgtacctaat 
cagcaccaca 
gcaaatgtaa 
tagaactcag 
tgctcctgaa 
aaaccaacga 
gtagagggaa 
acaccctaac 
gcagaaggca 



gaaacattct 
tttccttgca 
acatcatttg 
ttaaagaaga 
ctgaaagtaa 
caacagcata 
tttcctttgc 
tcaccttgaa 
aaaattttgc 
agtcctctag 
tcttctattt 
aaattcgctg 
acaaaacaag 
agctcccagc 
ggttcttctc 
cgcgagcgga 
ccctttccta 
cccaatactg 
tggcttggag 
caaactgcaa 
aggtaaacaa 
ctgcctgcct 
gtaacctctg 
cagcacacag 
cctaactggg 
aacagacctg 
tccacaccga 
caaagatagg 
ctcctccaaa 
ttgacgagct 
aaaccaaagg 
taaccaatag 
acatgaagaa 
aatggaagat 
aagaaatgag 
gattggtgta 
tattatccaa 
acagagaacg 
attcaccaaa 
taccctcaaa 
cagaagatag 
ttcatatcca 
gcaaatgctg 
gctaaacatg 
accatcgaga 
aatgacagga 
tccaattaaa 
tgtattcagg 
aggaagatct 
gataaaacag 
gtaaagggat 
ggagcaccca 
acattaataa 
acagaaagtc 
agacatctac 
ccacacctac 
aagaacagaa 
gattaagaat 
tgactgctgg 
gaacaaagac 
atttatagca 
atcacaatta 
agaaataact 
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gttggccctg 
aataaatgat 
tatttcaatc 
aggaaggcaa 
tatcattttg 
aagctttgta 
atctccttat 
ggttgcttct 
ttgtttccac 
gacaaattta 
ctagtagcag 
tgatattttt 
aaagacaggg 
gtgagcaaca 
actagggagt 
agcagggcga 
gtcaaagaaa 
cacttttctg 
ggtcctacac 
ggtggcagcg 
agcagccctg 
ctgtaggctc 
cagacttaaa 
ctggagatct 
aggcaacccc 
cagctgaggg 
aaacccatct 
gaaaaaacag 
ggaacgcaat 
gagagaaggc 
taaagaactt 
agagaagtgc 
tgcagaagcc 
gaattgaatg 
caaagcctcc 
cctgaaagtg 
gagaacttcc 
ccacaaagac 
gttgaaatga 
gggaagccca 
tgggggccaa 
gccaaactaa 
agagattttg 
gaaaggaaca 
ctaggaagaa 
tcaaattcac 
agacacagac 
agacccatct 
accaagcaaa 
actttaaaac 
caattcaaca 
gattcataaa 
tgggagactt 
aacaaggata 
agaactctcc 
tccaaaactg 
attataacaa 
ctaactcaaa 
gtacataacg 
acaacatacc 
ctaaatgcct 
aaagaactag 
aaaatcagag 



agtctcacat 
ctagtgaatt 
atttattttt 
tgaaaaggag 
aaaacagact 
tgtctctgag 
ttcaagtaaa 
taaaaaataa 
ctttacaaac 
actttggggt 
agtttgaact 
caaatagtat 
gaggagccaa 
cagaagacag 
gccagacagt 
ggcattgcct 

ggggtgacag 

acgggattaa 
ccacggagtc 
aggctagggg 
aagctggaac 
cacctctggg 
tgtccctgtc 
gcctcctcaa 
cagcgggggc 
tcctgtctgt 
gtacatcacc 
agcagaaaaa 
tcctcaccag 
ttcagacgat 
gaaaactttg 
ttaaaggagc 
tcaggagccg 
aaatgaagtg 
aagaaatatg 
atggggagaa 
ccaatctagc 
actcctccag 
aggaaaaaat 
tcagactaac 
cattcaacat 
gcttcataag 
tcaccaccag 
accggtacca 
actgcatcaa 
acataacaat 
tggcaaattg 
cacgtgcaga 
tggaaaacaa 
aacaaagatc 
agaagagcta 
gcaagtcctg 
taacaaacac 
cccaggaatt 
accccaaatc 
accacatact 
actatctctc 
accactcaac 
aaatgaaggc 
agaatctctg 
acaagagaaa 
aaaagcaaga 
cagaactgaa 



ttcaatctag 

tatatatttt 

cttcaccata 

acggagatag 

acgggtccca 

gacaagatag 

tgagggactt 

ctgtttatta 

aaaaatggac 

actctgtttc 

caattaattt 

tttatggaat 

gatggccgaa 

gtgatttctg 

gggcacaggt 

cactcaggaa 

atggcacctg 

aaaacggcgt 

tcactgattg 

aggggcgccc 

tgggtggagc 

ggcagggcac 

tgacagcttt 

gtgggtccct 

agactgacac 

tagaaggaaa 

atcatcaaag 

ctggaaaccc 

caatggaaca 

caaactactc 

aaaaaaattt 

tgatggagct 

atgcgatcaa 

agaagggaag 

ggactatgtg 

tggaaccaag 

aaggcaggcc 

aagagcaact 

gttaagggca 

agcggatctc 

tcttaaagaa 

tgaaggagaa 

gcctgcccta 

gccactgaaa 

ctgacgagca 

attaacttta 

gataccaaga 

gacacacata 

aaaaaggcag 

aaaagagaca 

actatcctaa 

agtgacctac 

cccactgtca 

gaactcagct 

aacagaatat 

tggaagtaaa 

agaccacagt 

tatgtagaaa 

agaaataaag 

ggacacattc 

gcaggaaaga 

gcaaacacat 

ggaaatagag 
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acacaaaaaa cccttcaaaa aattaatgaa tccaggagct gtttttttga aaagatcaac 7X460 

aaaattgata aaccagtagc aagactaata aagaaaataa gagagaagaa tcaaataggc 7152 0 

gtgataaaaa atgataaagg ggatatcacc accgatccca cagaaataca aactaccatc 71580 

agagaatact acaaacacct ctatgcaaat agactagaaa atctagaaga aatggataaa 71640 

ttcctggaca catacactct cccaaactaa accaggaaga agttgaatct ctgactagac 71700 

caataacagg atctgaaatt gtggcaataa tcaatagctt accaaccaaa aagagtccag 71760 

gaccagatgg attcacagcc gaattctacc agaggtacaa ggaggagctg gtaccattcc 7182 0 

ttctgaaact attccaacca atagaaaaag agggaatcct ccctaactca ttttatgagg 71880 

tcagcatcat cctgatacca aagccgggca gagacacaac caaaaaagga attttagacc 71940 

aatatccctg atgaacattg atgcaaaaat cctcagtaaa atactggcaa accgaatcca 72000 

gcagcacatc aaaaagcttg tccaccatga tcaagtgggc ttcatccctg ggatgcaagg 72060 

ctggttcaat atatgcaaat caataaatgt aatccagcat ataaacagaa ccaaagacaa 72120 

aaaccacatg attatctcaa tagatgcaga aaaggccttt gacaaaattc aacaacactt 72180 

catgctaaaa actctcaata aattaggtat tgatgggacg tatttcaaaa taataagagc 7224 0 

tatctatgac aaacccacag ccaatatcat actgaatggg caaaaactgg aagcattccc 72300 

tttgaaatct ggcacaagac agggatgtcc tctctcacca ctcctattca acatagtgtt 72360 

ggaagttctg gccagggcaa ttaggcagga gaaggaaata aatggtattc aattaggaaa 72420 

agaggaagtc aaattgtccc tgtttgcaga cgacatgatt gtatatctag aaaaccccat 72480 

catctcagcc ccaaatctcc ttaagctgat aagcaacttc agcaaagtct caggatacaa 7254 0 

aatcaatgta caaaatcaca agtattctta tacaccaaca acagacaaac agagagccaa 72600 

atcatgagtg aactcccatt cacaattgct tcaaagagaa taaaatacct aggaatccaa 72660 

cttacaaggg acgtgaagga cctcttcaag gagaactaca aaccactgct caatgaaata 72720 

aaagaggata caaaccaatg gaagaacatt ccatgctcat gggtaggaag aatcaatatc 72780 

gtgaaaatgg ccatactgcc caaggtaatt tacagattca atgccatccc catcaagcta 72840 

ccaatgactt tcttcacaga attgcaaaaa actactttaa agttcatata gaaccaaaaa 72900 

agagcccgca tcgccaagtc aatcctaagc caaaagaaca aagctggagg catcatggta 72960 

cctgacttca atctatacta caaggctaca gtaaccaaaa cagcatggta ctggtagcaa 7302 0 

aacagagata tagatgaatg gaacagaaca gagccctcag aaatatcgcc gcatatctac 73080 

aactatctga tctttgacaa acctgaggaa aacaagcaat ggggaaagga ttccctattt 73140 

aataaatggt gcagggaaaa ctggctagcc atatgtagaa agctgaaact ggatcccttc 73200 

cttacacctt atacaaaaat caattcaaga tggattaaag acttaaacgt tagacctaaa 73260 

accataaaaa ctctagaaga aaacctaggc tttacctttc aggacatagg catgggcaag 73320 

gactttatgt ctaaaacacc aaaagcaatg gcaacaaaag ccaaaattga caaatgggat 73380 

ctagttaaac taaagagctt ctgcacagca aaagaaacta ccatcagagt gaacaggcaa 73440 

cctacaaaat gggagaaaat tttcacaacc tactcatctg acaaagggct aatatccaga 73500 

atctacaatg aacttaaaca aatgtacaag aaaatcaaac aaccccatca aaaagtgggc 73560 

gaaggacatg aacagacact tctcaaaaga agacatttat gcagccaaaa aacacatgaa 73620 

aaaatgctca ccatcactgg ccatcagaga aatgcaaatc aaaaccacaa tgagatacca 73680 

tctcacacca gttagaatgg caatcattaa aaagtcagga aaaaacaggt gctggagagg 73740 

atgtggagaa ataggaacac ttttacactg ttggtgggac tgtaaactag ttcaaccatt 73800 

gtggaagtca gtgtggtgat tcctcaggga tctagaacta gaaataccat ttgacccagc 73860 

catcccatta ctgggtatat acccaaagga ctataaatca tgctgctata aagacacatg 7392 0 

cacaagtatg tttattgtgg cattattcac aatagcaaag acttggaacc aacccaaatg 73980 

tccaacaatg ttagactgga ttcagaaaat gtggcacata tacaccatgg aatactatgc 74040 

agccataaaa atgatgagtt catgtccttt gtagggacat ggatgaaatt ggaaatcatc 74100 

attctcagta aactatcgca agaacaaaaa accaaacact gcataccctc actcataggt 74160 

gggaattgaa caatgagaac actggacaca ggaaggggaa catcacactc tggggagtgt 7422 0 

tgtggggtgg gggagggggg agggatagca ttgggagata aacctaatgc tagatgacga 74280 

gttagtgggt gcagcacacc agcatggcac atgtatacat atgttacaaa cctgcatgtt 7434 0 

gtgcacatgt accctaaaac ttaaagtata ataataataa agaaatgggt cttctctacc 74400 

caagaacaca cacacacaca cacacaaaaa aaaaaaaaaa aaaacaagaa agacaaacag 74460 

tatgctcatc acacacaaat gcaaggtcat ttttaggcat tagatggggc ttaatttaat 74520 

ctaagttttt catgtacttt gtatcttcta acctgcatac ctctatttcc tcttctagcc 74580 

ctctacctct ggtaaccact gttttatttt tatcactgta tatttaattt tttaaatatt 74640 

ccccatataa gtgagatcat tcagtatttg tctttctgtg tctggtttat ttcacttaga 74700 

ataatgtcct ccaggcttgt acgtgttgta tcaaataaca cgatcttcct tttcagggat 74760 

gaataatatt ctattgtaaa tatataccac catttcttta tctatttgtc cttcatgagg 74820 

cacttgggtt gtttcaatac cttagctgtg atgaataata ctgcaaaaac atggaagtac 74880 

agatgcttta agaggtggtg aattcacagg ataaacaagt ctagagatct aatgaacaac 74 94 0 

atgaggacta ggggtaataa aattataccg tatttgggat tcgtgctaaa tgaatatatc 75000 

ttagctgatc ttgcacacag acacacaaaa agggtaacta tatgttaatg tttgtgttaa 75060 

tttgctttac tatagtaaga ttttactcta tgtatcacat gacattatgt tgtgaacctt 75120 

aaatagatgc attaaaattt attttttaaa agggttcatt aactgcggtt caggcaatct 75180 
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ataaaccttt 
aggggttcat 
gaattcttcc 
ttattcactt 
gattacggcc 
ttttaattgt 
atggcttgtg 
gtgcttagaa 
acataatttt 
ccttatcctc 
ctcaaatccc 
ggctccaacc 
gtgtgtgtta 
tcagttagac 
ccagtgtatc 
gagtggtgga 
aattggtctt 
acgaactccc 
gtctgctgtt 
aggcctcggg 
aatgcaacat 
gtctgtgggt 
tgctcctgta 
tacaacttgt 
gcagaacctt 
caccttttat 
cacatcttag 
attgtgtatt 
ctcacaggca 
agtccaattt 
ctcatttttc 
ataggaactt 
gtgaaatatc 
gtgtgtgtgt 
agaaccatta 
ctgagacagg 
gacacagctt 
ttatacattt 
gtgcagaaca 
taaatttaaa 
agtctgtgtt 
taaccgcctg 
ctttatttat 
gttgttgcca 
ctggagtaca 
cttctgcctc 
tttttgtatt 
tgacctcagg 
atcctgcctg 
aactgtattt 
ataatagctc 
gtttgttcgc 
cttgtttcat 
ttattatttc 
attactgaag 
atagcttcta 
gtcacatagt 
atggtgcact 
atatctcaag 
ggtgatatgt 
tcttacctca 
cctttttgta 
tctaaaactt 
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tacattttgg 
ttgaacaagc 
taagtagttt 
gtgattgaat 
tagtgcattc 
tgagagttgg 
tgttccaggt 
gaccatccca 
tttgtacaag 
ctcgcatggc 
taggggtgga 
ccacagcagc 
cagtgtgatc 
cctctgcatt 
ggaaaaattg 
ggtggatctc 
cccctggagt 
ctcggtgtct 
gtgctcctct 
tttttttggg 
ttgggtgcaa 
ggagccctca 
tcaatattgg 
aactttttcc 
cccaacatac 
tctgtcctga 
ctgtagtctt 
tctggatccc 
gttattaagc 
cctactgtgt 
tttttgtaac 
gaaatgagta 
tgtagtttag 
gtttgtgtgt 
catctggtat 
tctcagttag 
ccagaggtcc 
tagagagaca 
actcaaagtc 
cattttctgg 
atgttaatgc 
aactagtctt 
ttttggttta 
taaactttac 
gtggctgatc 
agcctccgaa 
tttagtagag 
tgacccacct 
gccaaacatt 
cctttatttt 
tctctgtcca 
ttctgcatct 
gaatatctat 
tatttacaca 
taactctgta 
cttgatattt 
agaacatgac 
atgtaagaat 
gccttccagg 
attgcagcag 
gattctcaga 
cgttgctttt 
acagtaaaat 



aggtgagatt 
aaacaaacga 
tcttgactgg 
agcctacaaa 
tcacctggaa 
ggactttcct 
atggttgggt 
agcttggttt 
gattcttgca 
aagtgacagg 
atgcagacag 
gtctagggtt 
tttcagcttt 
accacaaggg 
gatcacatgt 
aatgagatgg 
cgggcagctc 
gcatcgttcc 
gctcctgtca 
cacaggatgg 
aaacaggaga 
ccagggaccc 
attctgtata 
tatccttgac 
tgacctagca 
cacaggggtt 
tttacctctc 
ataaaagcca 
cacatgcaac 
ctttttmaat 
cactgaaact 
acacttaacg 
ggagcttagc 
gcacatacac 
ccttgtattt 
tttagaaagt 
tgaggacatg 
taatctatca 
caggttgggg 
ttgacaattg 
tggagaggaa 
tcaagttaaa 
cacttcttat 
tttttttttt 
tccgctcact 
gtagctgaga 
atggggtttc 
gcctcagcct 
acttttttaa 
tctccacaac 
ctagaatggg 
atagtgccca 
tttgatgccc 
tgggaaaaca 
atccagggct 
cttagatata 
taagtgtaag 
gccatgacta 
tctatacata 
cagctaataa 
ttgtatgcac 
ctcaatagta 
aaatgaggtg 
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ttcagtatct 
agaaacaagc 
tttcttttct 
attgatctag 
taaaagcctt 
tctcattctg 
catggtcatg 
aatgctttga 
gttttatttc 
agtgtggctt 
tcaggtgtac 
aagtgtttac 
gctgtcttca 
cagcaggctt 
ggacatggag 
atggggagcc 
atttgccgat 
accatcactg 
acatccagcc 
ggcctgtggt 
gcctgttctc 
cacccttctc 
ctatgtatct 
ctactcctct 
agcccaggtg 
agacctactg 
tccacttatg 
atgagcctct 
aaataagtta 
ctgaaattcc 
gtactctggt 
tgagaaagat 
tggattttct 
acacaaattt 
gctcgttatc 
ttattttgcc 
tgttgaaggt 
atcaatacat 
cagggtggcg 
gtttgtctaa 
taatgaggca 
ttttacaagc 
gatatccttc 
tttttggtag 
gcaacctctc 
ttacaggcac 
tccgtgttgg 
cccaaagtgc 
tgaggtcttc 
agttatttga 
aaaaccaaaa 
atcacacaaa 
aaaatgacat 
cagtgattca 
cctaaccagc 
tcacatacag 
gagaatgtgt 
tctgtgtccc 
caattttatg 
aggaagtttt 
tagtagtgtg 
acagcaagac 
actcattttg 



gcatttatct 
ttaagaatct 
ttttctgaaa 
acaattaggc 
cctttggagc 
tattttcagt 
cagtctattc 
tctcatggtc 
gtattgaaat 
gcttctttgg 
aggtcgtggg 
agctcctgaa 
ggcggcttgc 
tctgtattcc 
gatgagtgca 
agaaagggga 
gatgctctga 
gtctgctggc 
acttgtgtcc 
gggccagagt 
acttaggtcc 
tacacatcac 
gaggtcccag 
gctcgttctt 
ttccttattc 
tggttttcct 
agcccttctt 
cagtcaaaac 
ttctgcaaat 
cacatcctga 
gaaggccact 
ttgaagtgaa 
gcactttgca 
cactcagtga 
cttctgtaga 
aaggttgagg 
ggtcatggta 
gtaagattta 
tgtcgtttcc 
agacctgaga 
tgtccgaccc 
cctggctgag 
ttagaagttt 
aaccttattc 
catcctgggc 
gtgccaccac 
ccaggctggt 
tcgaactaca 
cttgactttc 
atattattta 
aaagcagaaa 
atattgagta 
gagaagagga 
ttcagttgca 
tcctaacatc 
gtactcaagg 
ctggcacaat 
cagaagttcc 
aaacagttta 
taaaatacat 
acctcggttc 
agagttttga 
agatgagaca 



tggattttca 
ttcttacaat 
cagtcttttc 
tcttgaaatt 
atcttctatt 
gtcagttttt 
acaaggcctt 
ctgaaatttt 
aggagagttt 
tgccccactg 
agtgtttttg 
gccccaatgg 
attaatcagc 
aagttcttgc 
aggttttatt 
cggagtggga 
ccacccccag 
atctgctggt 
atgcccacta 
ggtcttggaa 
ttgggcaaag 
ttcccaaccc 
ttctaggttt 
cctctatcat 
aagaggtgtt 
tctttgtaca 
ctctactgca 
tgtttcagat 
gaactcctct 
tatattttct 
gatactcttg 
taagagtaaa 
ttaagtgtgt 
tagtttccaa 
cccaggaaat 
acacacctgt 
cagcttgctt 
ccttggtttg 
aggctgtagg 
tcaatagaaa 
ccatttctct 
gaggaagtcc 
tcttgctaca 
tgtcgccagg 
ttgagcaatt 
acctgactga 
cttgaactcc 
ggcatgagcc 
ctttccccaa 
tttacttgaa 
tttttgaaag 
tcagtaaata 
aagataaata 
aagcttttaa 
aagatatgtc 
gcacaagtca 
cctgtttatc 
ttatcatcat 
gctttagaaa 
aattttcagt 
aattatttaa 
ttataggcat 
agaattgaga 
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agcctgtaat aaaaatttca agtatgaaag 
aaaggaaaat gtgaattagc aaagattctg 
ttgtaatacc tcttgttaga tacaagttgt 
tattaattta ttttctttct aaacacatta 
atatgaaaat ttattttatt tcacagttct 
gccattctga ctgtgcattt tgagctatga 
ataatcatta gagatgaaga ttccagtacc 
tatgggctaa taaacaagct tcccaagaat 
aattctcctg agaacatgac atcacgcaag 
ttgcaagtca ttcattctta catgctctcc 
aatatgtgtt gagaagtata attaaaaaat 
attatgtgat gagaaagtat atactttata 
gagaaatata gaatactctg gaaataaaga 
aatggatgag aaatgcttca caaagaagtg 
gaattaggtt ttaaaaagaa caggaagata 
tagatagata gatagataga tagatagata 
acaggtgtag gtatagatat atctataggt 
cctaggcaaa tgatataatg acattgtaga 
taccaaaacc aaagtttatg gtagtaccca 
atgtggagaa aaactaattc aaatgaacat 
ctttatttcc atagcttgaa gagcaaactg 
aatcacagaa ttcattttac ttgggctcac 
tgtgctgttt ctggttgttt acctcgtcac 
aatgagactg gactctcgcc ttcacacgcc 
tgtggatttg tgctatacat caaatgcaac 
gaagaccatt tcctttgctg gttgctttac 
cactgagttt tacatgctgg cagcaatggc 
tctgcgctac agtgtgaaaa cgtccaggag 
tgtctatggc ttctcagatg gactcttcca 
tagatccaat gtcatcaacc acttctactg 
ttctgatact tatgtcaaag agcatgccat 
ctccctcacc atcgtcttgg tgtcctatgc 
atcagcagag ggaaggcaca aggcattctc 
cctgttttat gggactctct tttgcatgta 
ggaatctaaa ataatagctg tcttttacac 
ctacagtctg aggaataaag atgtgaagca 
tgtcatgacc atggtgatgc ctttgtttcc 
acatgtccta gcgttctgat ggtgagtttt 
tcagctaaaa agctcatgct gggtaaaaat 
atattccatg aatcagcagc atgagctctt 
agtctgcacc tcaggtgcac tgtatttaaa 
gataagcaca ctgaattttg aggagcactg 
actgttgtgc tctttttgtt tacaacggca 
cttgaatgta ttctattctt attctcgcct 
ccattcttct gcactccaag aaatccattc 
agaccagaaa ctaggggcca tggtgatggc 
gaagagtggc aatggaaaaa gaggggaaga 
tatggatcca gctctctgag ttacataaac 
ggaggtcact ttctgtttcg cctgattttg 
ggactaaaat taaaatggga aatgaaagaa 
atccaccttg ctttgaatat tccttctatt 
tttttgtttt ccttagcact ggcctgttta 
acaggaagag aagacattgc tctggcatct 
ctctttagac attatgctaa aaagaattat 
aagatagagt ttcactctgt tgcccaggtt 
ttcctccaac tcctaggctc aagtgaccct 
acaggtgtgt gctaccacac tcggctaatt 
ctatattgcc caggcaggtc ttaaactcct 
ccaaagtgct gggattatag gcataaatta 
aaactttcta cttattgcct caaaatacta 
aaattctgcc attacgccat tagaaaataa 
ttgtacatta agtaactgta tttataaggc 
tttgttgtta gttttacaag gaaaaaagag 
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aatgggcaaa 

atgtgaagag 

gattaaaaga 

agtcattgct 

ttgttcattt 

tcttggaaca 

ccacataaga 

tgcttctaca 

gctacatgga 

tttttgcatt 

gaacttcact 

atgatacaat 

agagagaatt 

gcctttaaag 

gaagagatgt 

gatagataga 

gtagagaaag 

cttacaaaat 

gaaatagata 

taacatggta 

tcaggaaatg 

agattgcccg 

cctgctaggc 

catgtacttc 

cccgcagatg 

acagtgctac 

ctatgaccgc 

agtttgcatc 

ggccatcctg 

tgctgacccg 

gttcatatct 

cttcattctt 

cacctgtggt 

tataagacca 

ctttgtgagt 

ggccttgaag 

taataaacat 

aatattctct 

gagatttttc 

ccttggaggt 

tatgtgtttt 

ctgtgggtga 

aacaaaataa 

gctgcttcag 

ttctgtactc 

ttctggtttt 

gaatgattgt 

ccaccttccc 

aaaaatttaa 

ccgagtaaac 

tgtaattaag 

tatgtgtcca 

tcaagaaact 

ttaaggaaca 

ggagaacagt 

cccgcctgag 

tttaaatttt 

gggctcaagt 

ctgtgcctgg 

aatattttct 

aaagataaaa 

ttctgcattt 

ccccaagcta 



aattaaaaac 

aggtaaaaag 

ttaaataaca 

ctaaacatcc 

atttggcata 

cttgcattag 

atttctgcaa 

ctaagtttca 

gataaatctg 

ttaagaaaat 

ttgtccctgc 

gggggctaaa 

cagtgacacg 

tgggtcatgt 

agctcagtca 

tagatgatgg 

aatatattag 

tatagtaagt 

actttcgatg 

tgtgcatttg 

tccaacacaa 

gaactccagt 

aacctgggca 

ttcctcacta 

tcgactaata 

attttcattg 

tatgtggcca 

tgcttggcca 

accttccgcc 

ccgctcatta 

gctggcttca 

gctgccatcc 

tcccatatga 

ccaacagata 

ccggtactta 

aatgtcctga 

taaatcgaaa 

gtgagtctat 

taggctttgc 

tgttacacgt 

atccaaactc 

aacgtggcat 

atgtgctccc 

cagagatgtc 

ctttcctgac 

tataagtgct 

acttttctta 

attctgagcc 

gcaggctgat 

aataggtcat 

tttaggtaag 

aggaagtaag 

ttagaaagta 

agtaaaaaat 

ggcatgatca 

cctccctagt 

tttgtagatc 

ggctctcctg 

tcaagattct 

aaatatcttt 

tattttgtgt 

tgttcctgtt 

tctaagtcaa 



ttctcgaaag 

ccaaaaggat 

actctgaaaa 

tcaagatact 

tgtttattga 

aatcctacag 

aaacctgtga 

ataatcactt 

tgagtaaaaa 

ttagtataaa 

tctcaacacc 

taataaatta 

tattaggaga 

agcatgaata 

tatagataga 

atagatagat 

taatgtaatg 

ctaccataat 

atatatgaag 

tttatattgg 

atggcagtgc 

ctctgctttt 

tgataatgtt 

acttagcctt 

tcgtatctga 

cccttctact 

tatatgaccc 

catttcccta 

tgaccttctg 

agctttcttg 

acctctccag 

tccggatcaa 

tggctgtcac 

agactgttga 

atccattgat 

gatgaaatat 

tctttggctc 

gttgagtgtc 

tcctccacat 

acagaatcaa 

ctagatgatt 

gccctggaac 

agcccaattt 

tttaagaaac 

ttgctgtggt 

cttacatagt 

aacttgagtt 

tccagtgtat 

ctgagaataa 

acgatagagt 

taatgttaat 

gatttcttcc 

gaaagaattc 

atattttttt 

tattaataac 

agctgggacc 

tgatatctgt 

cctcagcctc 

tctactctcc 

attggaaggc 

atatattaag 

gttgtttggg 

caacggcaat 
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80280 
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81300 

81360 

81420 

81480 

81540 

81600 

81660 

81720 
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81840 

81900 

81960 

82020 
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ttatgacaaa 
ggtaatcaac 
ttcatatcca 
agcaaccctg 
ctctaaaatt 
tgagttttat 
agaggtagtg 
tgtttgtttt 
ttggctcact 
gtagctggga 
atgggttttc 
ctcagcctcc 
attttagggg 
tgtggcatta 
gaagtgggat 
aatttcaggg 
atgtaattac 
ctttttctca 
tgacttttgt 
agaaatttta 
tgtgaattgg 
atggagtatt 
tccaaaaagc 
ataattattg 
ggatataatc 
agcatgttcc 
tcttctcttg 
acttaaaatt 
aacaattatg 
tatttaagct 
gaattcaaag 
cttaacataa 
gagatggccc 
aaatacaaat 
tagaagggat 
tgatttcctg 
aaccaaaaga 
aagagatagt 
cagtcgttcc 
aatgaaaatt 
gttgcaaact 
atccctggga 
cagaactaga 
attcaacatc 
aaaataataa 
ttggaagcat 
tttaacatgg 
atccaaatag 
ctagaaaacc 
aatcaatgta 
aatcaggaat 
actaactaga 
agagatgaca 
cttgattgga 
aaggagatta 
ggcacaatct 
aggttagact 
ctcaaacatt 
agcttgcaga 
acaaactccc 
atatatatat 
gctcatagat 
gattcaatgc 



tatatggtta 
caaaatgctg 
tttgttacta 
ctcatttgtc 
tacttcattt 
ttatttttcc 
aaaacaaata 
gagacagagc 
gcaacctctg 
ttacaggcac 
gccatgttgg 
caaagttctg 
gaacatattt 
ttaagattcc 
tgttttatta 
gcataaggaa 
aattaacaac 
tctgatcaag 
aggtagaagt 
cttaatttca 
ggtggtggct 
ttttctttaa 
aaactaacta 
caaaatttat 
aagattcttg 
aggcagtgta 
gataacaggc 
aagattgtga 
atggtgcaca 
ggcatcctag 
cagtaattag 
caactaaata 
aaaaaacaaa 
aaccatcaga 
ggataaattc 
aacagaccaa 
aatcccagta 
accatttgta 
ttaaggccag 
ttaggccaat 
taatccatga 
tgtgaggttg 
ggcaaaaact 
ccttcatgtt 
gagccatctg 
tccccttaaa 
tattggaagt 
gaagagagga 
caaaaacttc 
caaaaatcac 
gcaatccaat 
aagatgaaag 
caaacaaaag 
ttgaaggatg 
acatttgagt 
aatcagctga 
ggcttagcct 
gaactccatg 
tggcctattg 
atatatatat 
gtatatccta 
aaggaaaatt 
tattcctgtc 



tatttcacag 
aactctaagg 
ggatacataa 
ttaattagag 
catctgcaaa 
ttcatcattg 
tcaaggcctt 
cttgctcttg 
cctcccgggt 
ccaccaccat 
tcaggctggt 
ggattatagg 
aatgtatcat 
atttaaaaat 
agtatcaaat 
gattttccca 
catgtatttc 
gcaccatggc 
aagcattttc 
cactatattc 
gaaggatagt 
tgaagaaata 
tagtcagatt 
taattttaaa 
ttttcaatat 
ctgagtgttg 
tggaggggaa 
caagtgtcat 
gattttactg 
agatgattag 
aactaagcac 
atataaggag 
attataaatg 
gactattatg 
ctggacacat 
taatgagctc 
ccagatggat 
ttgaacctat 
catcattctg 
atccttgatg 
gaacatcaaa 
gttcaacata 
atgtgttatt 
aaaaactctc 
tgacaaaccc 
aacctgcaca 
gttagccata 
attcaagcta 
ttcagctgat 
ttgctttcct 
tcacaattgt 
agcactacaa 
ggaaaacatt 
caaagtattg 
cagtgaggtg 
cagtgtgccc 
cccaatctac 
ttcttcagct 
taggaccttg 
atatacacac 
ttagttctgt 
aatattatta 
aaactaccaa 
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cttctaaaga 
ccaagagaaa 
taattgagct 
tagaattttt 
ttaacagatt 
cagacttaga 
gttcctcatt 
tcgcccaggt 
tcaagtgatt 
gcctgactaa 
ctcgaactcc 
cgtgagccac 
caggttgtat 
ttgacatatt 
tatcttaagt 
ctcatccaac 
taatttcaca 
tccaacatcc 
tgcattgtgt 
atttctgtgg 
aataatatgt 
tgctgttctg 
atttggattt 
aaatcacgtg 
atttactcaa 
tgtatacagt 
gatatatctt 
aaataagtag 
taggtagaag 
ggaataacta 
cacatcgaaa 
gctgctagct 
accaaggaga 
agcacctcta 
acccttcaca 
caacattgaa 
tctctgccaa 
gcaaaaaatt 
atatcaaaac 
aatattaatg 
agcctaatct 
cataaagtta 
tcaacagata 
agtgagctag 
acagccaaca 
aggcaaggat 
acaattaggc 
tgcctgtttg 
aaacaacttc 
atacaccaat 
tagaagaaaa 
tgagaattac 
ctgtgatggt 
atccttgatg 
ggaaaggcag 
agaatataaa 
atctttctcc 
ttgagacttg 
ccttgtgatc 
acacacacat 
ccctctagag 
caatggccat 
tgacattctt 



caagagtaca 
ccgtatgtct 
taattatgac 
gtttgtttct 
ttattccttt 
ctgatattat 
tcatttgtgt 
tggagtgcaa 
ctccttcttc 
tttttgtatt 
tgacctcatg 
catacccagc 
atcattagtt 
tgttcacgca 
ttcaggctta 
atagacttat 
tgtgcgggat 
tcatctatgg 
ccataatcta 
tcaaaaaaat 
ttaggcaatt 
cttcatccat 
tatgagaatt 
aaccaaaact 
aaaacattta 
gaatataaaa 
aaacaaacaa 
aataaggcaa 
atgtcttctc 
gaagaagttt 
tgatagaaag 
agactaatat 
tgttaccact 
tgcacacaaa 
agactaagcc 
tcagtagtaa 
attctaccag 
taggaagaat 
ctggcaaaaa 
caaaaatctt 
atcacaatca 
aatgtgattc 
cagaaaaggc 
gtattgaagg 
tcacacagaa 
gctgtctctc 
aagagaaaga 
cagtcaatat 
agcaaagtct 
aactgtcagg 
ataaaatact 
aaaatctgct 
taataatgag 
tgtctgtgag 
acataccctt 
gtaggcagaa 
cgtgctggat 
gactggcttc 
atgtgagtta 
atatatacat 
aaccctaata 
actacccaaa 
catagaatca 



gctgaatctc 
cattgtatat 
ccttgcatcc 
tctataaatt 
ccaatattta 
aatgtcaaat 
tttgtttctt 
tggcacaatc 
agcctcctgg 
tttaatagag 
atccacccac 
cttgctcctt 
tgattttttt 
tctgttgtta 
atgacagtgt 
actgagagag 
gctttgttct 
gatatcttgc 
ctttttaatt 
caaataaaat 
tgggtaaatc 
ttattgcaat 
tcataatatt 
aaacacttgt 
tggataactt 
tacagtttct 
aataaactgc 
caggagagag 
ttagggttga 
gggagaaata 
atctcaacaa 
gaagaaaaga 
gaacccctag 
cgacaaaatc 
aggaagaaat 
gtagcctacc 
atgtacaaag 
gattcttccc 
cacaacaaaa 
caacaaaata 
agtaggcttt 
atcacacaga 
ttttattaaa 
aacatacctc 
tggggaaaag 
accactccta 
aataaagggc 
aattttatat 
cagggtacag 
caaacagtca 
taggaataca 
caaaaaaatc 
tgagtgtcaa 
ggtgttgcct 
aatctggatg 
aaacatgaaa 
gcttcctgcc 
cttgcccctc 
ataatactta 
atatatacat 
cacattacat 
gcaatttaca 
gaaaaaaagc 



82800 
82860 
82920 
82980 
83040 
83100 
83160 
83220 
83280 
83340 
83400 
83460 
83520 
83580 
83640 
83700 
83760 
83820 
83880 
83940 
84000 
84060 
84120 
84180 
84240 
84300 
84360 
84420 
84480 
84540 
84600 
84660 
84720 
84780 
84840 
84900 
84960 
85020 
85080 
85140 
85200 
85260 
85320 
85380 
85440 
85500 
85560 
85620 
85680 
85740 
85800 
85860 
85920 
85980 
86040 
86100 
86160 
86220 
86280 
86340 
86400 
86460 
86520 
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tatttaaaaa tttataggga accttgaacc caaatagcca aagcaatcct aagcaaaaag 86580 

aacaaacctg gagcatcacg ttacctgact tcaaactata ctataaggct acaggaagca 86640 

aaacagcatg gtactggtgc aaaaacagac aaataaaaca atgaaacaga attgaaaggc 86700 

cataaataag accacacacc tacaaccatc tgatttttga caaagctgac aaaagaacag 86760 

ggggaaagga caccctattc aataaatggt gctaggctga ctggctagcc atatacagaa 86820 

gattgaaacc ggacctgttt cttacaccat acacaaaaat caactaaaga tgaattaaag 86880 

acttaaatat aaaaccctga actgtcaaaa ccctgaaaga caacataggc aacaccattc 86940 

tggacatagg aactggcaga gatttcataa tgaagacacc aaaaacaatt gcaataaaag 87000 

caaaaattga caaatttttg gatataaata aacttaaaag cttctgtgaa agaaactttc 87060 

aacagagtaa acagacaacc tacagaatgg aataaaatat ttgcaaacta tccatctgac 87120 

aaaggtaata tccagcatct ataaggaact taaacgaatt tacaagaaaa aaatgatccc 87180 

attaaaaagt gagaaaagga catgaataga tacttttcaa aagaagacgt aaatgggagc 87240 

taaataatga aaacacatgg acacaaaggg gagaacaaca gacactgggg cctacttgaa 87300 

gatggaagat gggagtaggg aaaggatcag aaaaaataac tgtttggtac taggcttagt 87360 

atgtgggtga tgaaataatc tgtacaacaa acccccatga cgtgagttta actgtatcaa 87420 

aaaccttcac atatacccct gaacctaaaa aaaaaaaaaa aaaaaaaagg ttaaaaaacg 87480 

gctctgtgaa ggttctgagt ttgaaaaggg cttgacccaa gaccagtatg gctataccag 8754 0 

ggaatgaggg agacagtgac agataatgct gatttaggtg tagactatag taaaagtttt 87600 

tttttttttt tttttttttt ttttttgagg cggagtttcg ctctgtcgcc caggctggag 87660 

tgcagtggcg cgatctcgac tcactgcaag ctccgcctcc cgggtttacg ccattctcct 87720 

gcctcagcct cccgtgtagc tgggactaca ggcgcgcgcc accatgcccg gctaattttt 87780 

gtatttttag tagagacggg gtttcaccgt gttagccagg atggtctcga tctcctgacc 87840 

tcgtgatccg cccgcctcgg cctcccaaag tgctgggatt acaggcgtga gccaccgcgc 87900 

ctggcctata gtaaaagttt ttatttcggt atatcattgg aaaaatgtgc ttccaatgaa 87960 

gcattcatgt atctcttttg ccttcataat tagggcaaat acatgtacta tttgatcaca 88020 

tgtacaggtt acatcttata ttcatatgtc ctagctaagt gaggattgga tacttggtaa 88080 

aagggaaaaa aaaggtaggc aaaatatttt gatctttatt ctgtttgttt tcataggttg 88140 

gcaaataaat aacattttgc taagttacac attcaccagg ttttccataa aatattactt 88200 

cccaacacct ggattaaata ccgtgaacta actcaactac aacaattttg aggtccaaat 88260 

tattttaccc agtagatttt acaatttaac ttctttattc taaccatttt gtgaaaggta 88320 

gtgtccaaca cacaattcag gctgaaagag aatcacagaa acatcaacgc ttatgaatta 88380 

tgtcattatt tttttacctt tattaatcat tccattcatc tgattggaat aatgaaccat 88440 

aatgcaaaac cccatgtgaa taaactcctg atccccaaag caatcatctc gtagtgatta 88500 

caaattgttg caatcaagtt gcttaagtcc caagtttcta ttgagcacat gagtctgtac 88560 

caattactac attttccctg gagttgtgtt ttttagtaac attgcaattg tttcttcaga 88620 

gtgtattaaa aaatacttta gtgataagag ctccaaatta caagatgggt ataacttatt 88680 

cttgttaact cattttttgg agcactagtg tctgtgggag tacctttctt ggcatcaggt 88740 

aggaataaaa attctgactt ctgcacttgt actttctctt cttgtctcaa ttttggattt 88800 

ttatcagtag gagtcaagtg acaataaaag tagcttcaaa tgttagttta gattctcagc 88860 

ctattaaaat gagtttgtat acatttctta tctagacggg tgatgagaaa gtattacact 8 8920 

ttaattgaaa ttattcacaa caactgtttc actttacaaa tatcattctc accattcatt 8 8980 

caaatattga acccacttga ataactattg gagtaaacat taatatgaca ataaattcaa 89040 

gtgttgaaca caccttgaat aaccactgga gtaaatatta ataagacaat caattcaagt 8 9100 

gtttactaga ataatatctg ttgaaaagac cccttataaa ggaaacgctt tatcaaaagt 8 9160 

tcaaaagaga tcttattttt ttatggctga atagtatttc attgtatcat agatggacac 89220 

aatgaaatat catagacatt ttacattaag taaaataagc caggaaaaga atattaaata 89280 

ctctatgttc tcactcgtac gtggaaacta aaaaatgttt atgtcagaga agtaaaaagt 89340 

agaaaagagg atactagagg cttcctagag gctaggaagg gtagagagaa ggaaggatag 89400 

gcagaaattt gttaaaggat acaaattata gcaaggtaag agggataagt tctactgttc 89460 

tatggtcctg taggattact gtagttcata atatatagtt tcaaatagct agaaaaagga 89520 

tagtgaatgt tcccataata aagaaatgat aaatatttga cataagggat atgctaatta 89580 

cccttatctg atcactatac atcatatgca tcaaaacatc actgtgtacc cactaaagat 89640 

gaacaagtat tatatgtcat ttaaaaataa agtaaaataa aggggaaaaa tttgaaagag 89700 

ggacagatca attcattaga aattgtactt tttctagaaa acataaaaca cttatagaac 89760 

tttacatatc tttctttaaa ttatcagaga aaatttggag caaatttaat attaactaag 89820 

aagtacatga ctagtgatga gtgtagaata gaaaaatcta tgaatgcaaa atcagaatgt 89880 

tagaaatgat gggcacttta gaaactattt tgtcaaagtc tgacattata tagattagaa 89940 

ttctaaagct cataggaaaa aatcaccata tcgaggtaaa gagactgctt tctaaatgaa 90000 

gacttaaatg agtttttatt gacatatcat aggcaagaat aaagccacaa catataacat 90060 

gccctagaac ttaggttcat gcactgtgct tagataatgt gccctgttat tttatttgtg 90120 

agacactaga atggaaaaca aagagaaagg gatggtgctg aagtccattg tgactgtgta 90180 

gacaaggaag gtgctcctgg ctgagaaact gatattcaca ggagttgctt ggctccacca 90240 

actactagac ttatctacct ctgttttaat tgaagatcgt aaaatatcat tatttgccca 903 00 
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gcagttcttg cagacttatc tatgtcttat tgctcagttg aggtcgttcc tgggctcaat 90360 

aactgaaaat aagcttcagt aatctttacc tctgttttct atggtctttt ctttggcttc 90420 

aatttctcct agtctcccac acaattcatt ttataggatg acataatggt gacctgatgt 90480 

atacagaatg tttagtaact atgaaagcag ttccgtgcta tcttattttg ctatctcaaa 90540 

tatttgtatt ctattttatt tcattttatt ttatgagaca ggatctcact gtgtcaccca 90600 

ggctaaagtg cagtggtgca atcatggcat actgggccca agggagcctc tcacctcaac 90660 

ctccccagta gctgagacca caggcatgca ccaccacacc tggagatggt tttttttttt 90720 

aatatttgtt tttgtagaga tggagtctcc ctatgttgcc cagttttgtc ttgaactctt 90780 

aggctcaagc ctcagcctac caaagtgcta ggagtaaagg catgagccac attgcctgac 90840 

cttgcatttt aactggagcc tgaagaagcc aaacttattc ccaaatgaac atttccacaa 90900 

gtaactcaga tatcagtaac atttgtagaa atttactttg cacttttatt gaatcaatgt 90960 

aatttttttt ttagttggga ggatgggaaa atcttggaac tccgaatttc acccaaatat 91020 

ttaccatgtt taccaattta ccataaatga cacctctagt tttctgttca acaatatatt 91080 

caaacagsct actaaatgac aaataatgtt acttgaaaaa tatcttcatg taataaarat 91140 

aaacatatgg atatagatat cctgtgaatc aaaatttacc aacatgraaa atatgtgttg 91200 

actttgttat ctagtttatc tttagtaatt ccaatacagt gcgaagcaag ttaacattat 91260 

tgaagcagtg aaacatttgg agttttattt tggttatatt tcttgcaata agaaacatag 91320 

actttaaaaa actcactgat gtctacactg attcctaatt acccagaaat ttcagactga 91380 

acaccaatgt catggaaaaa cccctttgca gaaggacatc tggtttaatc catcattgag 9144 0 

gcttaattat ttttctgtgt gttatttctc acacagagaa caatgtatgt atagctatta 91500 

caaaactgat tcagcactct ttctaatgct ttaatgtgaa ttacttggca atctatattg 91560 

gaatacaagt tagaaaataa gagtcatgaa agataaattt tgctactttt tacacaaata 91620 

ttaaataagt gtgcactttt tctaagcaca gggcacactg cgcatggtaa acattaaata 91680 

agttatcaaa ggcaattcct acaggcatta aacatcacct ttgatacatt tcagcaaccc 9174 0 

agtttaacca ttggcaaata aataacattt tgctaagtta cacattcacc aggctttcca 91800 

taaaatatta cttcccagca cctggattaa ataccatgaa ctaactctac tacaagaatt 91860 

ttgaggtcca aattatttta cccagtagaa gtttaaataa agaactagaa atgatcttca 91920 

gatgacgtgt tattaaacat ttcacaatca ttcagctatt tattttcagt agactaaagg 91980 

ctttagctct gatgcagaaa atttgaatat atcatgggag ttatcaaaat tattttaaat 92040 

ttaccaatag tatttgtcat atgcatgctt ttagtaatct cagttgagca tatgcccccc 92100 

caacatatat atacacatac atgaatatat atagaggtgc ttgtatgtta cactgaggcc 92160 

ctctagaagt gaaacttgaa tgttttaata tgactaagtt cataatctcc tgcccaggaa 92220 

attataaatt aaaaacattc cagatctcaa agagcttttc tttactgaaa tattgagaaa 92280 

gacatttccc aagtacttta gcctttaaaa tatgattcat gcaagcaact aaattagaga 92340 

gttcttggga agagggctgt ggataacaag agaaaagtga gaagagaaga aactaaggat 92400 

gtgtgatcaa agaacaggtt aacatgtaaa aaattcaaaa gaggaaaaaa atgcttgcaa 9246 0 

gttatatgag agtattttaa atatttggac gttgtctcat ggattacatc tatttatcta 92520 

tctatctata tctatctatc tatctatcta tctgtctatc tatctctcca tctatctatc 92580 

atctatctgt ctgtctatta gtttgactta tgccttacaa ccaataaaca cttaataagc 92640 

aaccatgatt atttgttaaa tatgatatga tgaataaaca aatgctagta tttaggattt 92700 

cacaatacag attgtttgag agtagaatgg attcacgtcc acctgcaatt gatacgcaga 92760 

gctctgatat cacattctgt tttattttcc tcagagttta acattttttt ctcctgtttc 92820 

tattctctcc ccatttatgg agtgcctctt cagtgctcag tatttacccc aagcattttt 92880 

tttcctataa tcaatgatag ttctctacta gtggcacggc tgaatgaagc tggtttccca 92940 

aattgtggag attatgtctt acttccttga attttgttta gtttttgagg ttcagggatg 93000 

acagcattcc ctgctgctca gtgtgaaatt agggatttta gcctaaagag aaaatttgat 93060 

agtgctcttg ggtctttgaa taaggaaggt tgggacatgc tgaaagcctt tgcccacttt 93120 

ctggagcttt actttacaca tgtccatata gaaggatctg tgatgttctg caatagagca 93180 

acctatttta cctttcccca cttaaaccca gtatttccaa actttttttt ccccataggc 93240 

ctattttgtt gcttaatatt cagtggttct tctgaacaaa gttttgaaaa ttctaaccta 93300 

aaaactcaga cgatagcaag attaacatgt ttacaacccc ttttccttac tcctaatggt 93360 

acttgctcta catcagttga agggcttttc taaaagtgaa atgaaacaga agcggtgtga 9342 0 

tttctttgta gacctaaagt gcctatgtca atgggaggag atactattat ccaaatatga 93480 

gttttctctt taagctgtag agggtaggtt gctgcatata tttaaaggaa agagcaggac 93540 

atagaaaatt aaattatagt atctttttta tccttttttc tttatttttg tcattatttt 93600 

attatttgtt aagtataata tgatgattaa acagatgata atatacaaga tatattgcat 93660 

aaatgtaaag attagataaa tatatttatg ccattaaaac ataagattta tatttactaa 93720 

ggtgaatatt ttccaaaatg atgcatcctt ctggagagtg aggaaagggc acttttactc 93780 

tttacataaa tctttatttt ctaattattg cattttcctc tcaaaggcca aagttgtaga 93840 

tttaagacaa gaaatcattt caatttccaa gtaacaggga acgtcagttc taattcatca 93900 

ccatcacttg ggtgccatac acacgcacag gcacacacac tcaaaaactg cgtgtcaaag 93960 

acaaattcaa agatgtgctt ccttgtagat ttcaggaggc atgttcactc ttcttggtat 94020 

gtgttggaga ctttaagttt tgtttgacta cttaccgctt attaaatgtc ttatttgcca 94080 
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caatagaaaa gtcatccctg catctatgtg 
taagacaatt ttctgcatgt actgttcata 
tctaggaaat tcctagttat ttttcaagcc 
cctgattcct gaaacaatta gttattttgt 
taactttaat tttcatgctc cttaccatga 
gtgatcagtt gtcttcaatt ccccagcaaa 
gtttgtgcac aaatgtaacc atattacaat 
acagagtcaa agtagctaac ttgtgtgaac 
cactgaagta tgtagttgaa gacactggtg 
cctacagact tctgttttgt gccctggatt 
aacactatct ttggtagatt tcccataaaa 
ctcttaagac cttacaacac aaggtccctg 
gtgacttctg agtctacact gggaaccact 
caacccactt cactcattct tattagctaa 
tcttatctcc tttagtgttt tacttataaa 
taaaaaatta ttgtatagaa tcccagcacc 
ttgatattat ctgaattaat attatctgaa 
tttgaactct aggaataatt ctatctctct 
ttataatatt ataaagaagt ttgaactaat 
ccactagcag aacaaatttg tcacattctg 
aggattaaat taaaacataa atggaccagt 
cttctcttgc tttactccta caaactcctt 
ataggccctt gttattccaa atttagaaac 
attgcttttg attttgttgg tatagcagta 
aatttccttc aaacccctat gattctggac 
ctcaaaggtg gagacactaa agcaggggtc 
tccatgtcct gttagcaact gggccacaca 
tcctgcctga gctccgcctc ccgtcagatt 
ccaccctatt gtgaactgca tgcgcatgcc 
atctaatgct tgattatccc caacaacccc 
tcttccacaa aactggtcgt tggtgctaaa 
ccatgattgt atagtcaact gaaaaaggca 
tttaggacac tattccctca ttccaaaaca 
gaatcatctg catatgatac acccagttaa 
gccctatatt cttgagtcct gcctgactgg 
aagagagaga gcaaataaga gaaagagaaa 
ccttgattta ttttcatcca tttaagtgaa 
gaaattctac tzttggtgacg gaatttattc 
agcccatcct ctttgtactg ttcctgctaa 
ggatgttggt gttgatcagg atagattcac 
ctagtttgtc ctgcttggat ttgtattact 
acttcttctc agacaagaaa gccatttcct 
tcattgctgt ggtgattact gaatattata 
tggccatctg taaccctttg ctttacagca 
tgattgctgg tccatatgtc tatgggtttc 
accacttgac cttctgtggc tccaatatca 
tcatccgact ttcctgctct gacactttca 
gatttaacct ctccagctcc ctcatcataa 
ccatcctgag gatgcgttct gctgaaagta 
acctggtggc agtgactgtg ttttatggaa 
cggacaggtc agtggaacag tccaaagtca 
tgttgaaccc catcatctat agtttgagga 
tgatcagaag aaacgtgctt ttgaagtaaa 
aatctttcta tttatgagaa ctgtatttaa 
agatccccaa tgaaaaacta taatctaaca 
atggttttac atgttacata tgagtgaaat 
tgcatgtatg aggaataatt attcttttat 
tttacatgtg taacaattta catgtaaatt 
tttttacatt ataaaaaatt atacattgtt 
taaaaagtaa gagattaaaa tattttattt 
cttcctagag ttacaacgat taacagtttc 
ctacaaacaa atgtgtggtt atatgtacac 
tgactgtcat ggttgctttt tagtggttta 
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aaatatgttt 

ctagctaaaa 

ccagttagat 

ttgtattttt 

aaatcaacca 

gcaacttgca 

ggttaaatca 

ccttaattca 

ttgcaaggaa 

aagataaatt 

acaatgaatt 

ctaacataaa 

ctatgtttca 

agggctccca 

ataacacaag 

ataatttccc 

agtataatct 

tgattacttt 

gctgctagaa 

ttatatgctg 

agataaaatc 

cgactgaaaa 

cttatctttg 

agaaaaaaaa 

tctctaatac 

ccctaacccc 

gcaggaggca 

agtcttggca 

agctatctag 

accctccaac 

aaggttgggg 

gttagtggct 

cgttggtaac 

ttggggcctt 

ctgtgtgagg 

aaaggtaaca 

aaatgctggt 

tcttgggatt 

tctacctgat 

gcctccacac 

ccactaatgt 

atgctgcttg 

tgctagctgt 

gcaagatgtc 

ttagtggact 

ttaatcactt 

ttaaggaaac 

tcctcatctc 

ggcacaaagc 

ccctgttctg 

ttgctgtttt 

acaaggatgt 

atcagtgtat 

ctttagtagc 

acaaaacata 

atgtgtgtgt 

tcataacatg 

tacatgtaaa 

gtaataaaac 

tctttcccac 

tttagttttt 

acatatatac 

caattctgta 



aatttctctt 

ctgctcccca 

tattgtcctt 

attattgaac 

gctctttcaa 

tgcatggagt 

tttagcatcc 

atttccggga 

ttgtatccca 

taaaaatcga 

ctgatattta 

gcaaaattct 

cttcagtgga 

tagacattta 

gcaacattgg 

attgtctcat 

gttcattata 

tcattttatt 

acctgaaaca 

taaacccctt 

tatgaaataa 

..ttccttcatt 

gtttattcac 

agtgaagaaa 

agcacatagc 

caggccatga 

agccaccagc 

ttagattctc 

gttgtgtgct 

ccccatccat 

actgttgtta 

taatccattt 

acagaaacca 

tgctttttta 

gagaacaaga 

tgtaatggac 

acctaagaaa 

aaaggatctt 

cactgtcggg 

ccccatgtat 

gactcccaag 

tttagtccag 

aatggcctat 

caaagggctc 

gatggaaacc 

ctactgtgct 

atccatgttt 

ctacatcttc 

gttctccacc 

catgtacgtt 

ctacactttt 

gaaacaagct 

ctttattagt 

tttacagtaa 

gacatagata 

gaagggtagg 

catacattta 

ttacatttat 

attaggtctc 

accctgattt 

tttttttaat 

acgattgtta 

gcaattacta 



gaaagttgcc 

ctcttattcc 

tgatgcttac 

taatcatatt 

ggcaagcact 

gttcagtgct 

tgaaagcatc 

agaggttaga 

ctagctcaag 

atgaattgcc 

aaatgtaaag 

attggcctga 

ctcttaaacc 

agttttgatc 

gaatttataa 

ttcattacaa 

atttgtattg 

attctggaca 

aatttgtgtt 

aaatgcattc 

catttggcaa 

agtaattcat 

aaagaaatga 

aaaaacccag 

cagtgaaatt 

actggtatgg 

aagtgagcat 

atgggagcac 

ccttacgaga 

ggaaaaattg 

taaaggaagc 

ttcgacaaac 

ttctttttag 

agaccaccag 

gagaaagaaa 

ttctcattca 

atggttagag 

ccagagcttc 

gggaaccttg 

ttctttcttg 

atgttggtga 

tgctattttt 

gataggtatg 

tgtattcgcc 

atgtggacat 

gacccacccc 

gtggtagcat 

attctcattg 

tgcgggtccc 

agacctccca 

gtaagcccta 

ttttggaaac 

caaataaaaa 

agtcaatttt 

gagaaaattt 

gaggtgtata 

catgtatgaa 

aacaacatat 

tatggaaagg 

tttattctca 

ttcttcgtat 

aaacacaagt 

ccaagaaact 
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94740 

94800 
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94980 
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95100 

95160 

95220 

95280 

95340 

95400 

95460 

95520 

95580 

95640 

95700 

95760 

95820 

95880 

95940 

96000 

96060 

96120 

96180 

96240 

96300 

96360 

96420 

96480 

96540 

96600 

96660 

96720 

96780 

96840 

96900 

96960 

97020 

97080 

97140 

97200 

97260 

97320 

97380 

97440 

97500 

97560 

97620 

97680 

97740 

97800 

97860 
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tcagtctcat ggtttagagg aatgtcacaa ctgctggtga gacttgctgt ccctctaggc 97920 
tgtttctcta ggagagaaaa cttacgataa cacctctgtg tccatgcctg acacccactt 97980 
atgtctctcc acttgtgaga gtcactttca ataagacacc tttctccttc tgccacgaca 98040 
ggacatgagg ccagcccagc agttattttt caaatattaa ctctggaagc agacaaagaa 98100 
tggctcctat aagggtatat gactggctgc aaataactgt gacttctgca gaacttatca 98160 
attcaattct acacttctgg ggactagagt ccttcatgct ttctctcacc tctgcttttt 98220 
taatctattt ttcaggatag atcatatatt tttattagta gcatcattca aattttactc 98280 
acagtggtac tttgtcacag ttagttatta tatacctaat tttagaatga catatgtcga 98340 
cttgtttgta agtattttag tgctaaagga agcaaaggac aagttgatat tttaaggatg 98400 
tgacaaatct ggtgactttg cagttgataa cataagcaaa attgatgaat gtgtgctgag 98460 
tatctacaat aagtcacact tctcctaaca ccaaggactt ttcaaggagc tcgacaatgg 98520 
catagtttta caggggtcat gaaactgtaa acagaaagat caatacaatt tgttaaatgc 98580 
tgtaagacta tggaggtgct aacgaaagag aggaaagaaa aaattatctc tgggcaaatt 98640 
gaagctagcc ctgtgattgt ccaagactag aacttgttta tgctttatgc accttcctca 98700 
gccctcccat agctcctagt atgacgtaat gtcaggtggg gtctgagcaa aataattaag 98760 
aaccaatttg tgatcaaata tgtagatgtt agtttcaatg aaaaagaaaa aaccataaaa 98820 
ctataacaag tcacaactca aaaagacatg atattttttc tcagttgtat ttcaaagctt 98880 
tctttgtacc ttcgagttaa tctctgctta gtatttctac agaaattttc agactcattc 9894 0 
attttatatt attgcatata attattttaa accttgtact actacatgtt cctatttaaa 99000 
taccttgatt ttgaaagtca gattgcccac aaattctgtt aaaacacaca tataaataaa 99060 
aaagctattt ttactaattt tttacattta attaacaatt gttgaattcc gcttagacac 99120 
atacctagat gtattcctgt ttaaatacac atatgttttt agttatgtga aaccagcttc 99180 
ttaattatat aaaattttct gatctcaggg cccatgcttc tatctgtaca gcctagcttg 99240 
gcaccgaaga aaggttcaat gactgcagca tgaaagagta cacatcatac acaattttat 99300 
taaaagttaa tgaatgtcca ctttcaaagt aaacaaagtg cctcaatttt agaatttata 99360 
agaaaagtgt ctttgaacag accagcagat gactctgagg gaatatgtta ttagtcataa 99420 
gccaacttac ttttcagaaa tacaagtaac cataatgtca ttgatgttta tgactgactt 99480 
tagcaataca aaaccaatag ttacgcccat atacacaaaa gttcaagata gaaaggaatt 99540 
acaccaagtg ctttcatgaa ttgactggct cgctggaatc acgtctatca gcctgttaag 99600 
actatctaag tccatatcat attgtattag acataaaata aatatgtgct gatgaaaatt 99660 
tttggtatta atatgttatt ttagtaggcc ctggatttta gctaaaaagt aaagaataac 99720 
tttcttattg atgacattaa aatcccatta ttaaagtctg tcataagaga tttcaagaag 99780 
agtctccaat ttcatgtctc ttttctttca aagttaattc tgctgaattc tttatttgtt 99840 
ggacaaattg ggtaatacaa tgaagaaata atcagaaaaa ataaagaata tttactgtac 99900 
ttttgaaaag aacaataaat atgatatatt attttctcct cacctatcca aaagatcatc 99960 
actctgatat tcctttcagg aaaatttata catgtgtata aaatactaaa tgttataggg 100020 
agtagtgatg ttatgagcct aatgtcttta ctcatataac agatgttttt gggaacaatg 100080 
aataagtgaa aaatatttag acactcacac tcacacatac acacaaaaat aagacagagt 100140 
aatacttttt attatagttc aggtaatatt ttaatgtaag ttgttttact taatctgccc 100200 
atcaacacta tgcattcagt actataatct ccactttaaa gaaaagacaa cagatttaca 100260 
agttgttaac ctaaaagcac ataggtagat tagaagggga gctcaaatat gcctggcttt 100320 
aaagtttatt tgtttgtttg tgttactatt atacttttgc cactatactt cattgaaggc 100380 
aaaatcatac aatgaaaaat ataattattt tctatcttcc agaagtattg ctgatcatgt 100440 
atttgacagc tgcatgcaca ctgaaggata taatgcacta taaaaaattc tttttataat 100500 
gtatagataa atttaacttg ttccataaga caaatttttg cagtcagatc ctaactccat 100560 
catttattag ctaaatgagt agttatgagc atgaactctg gaatctgatt tactgagcaa 100620 
gttatacaac ctctctcggc ctcagtgccc tcttctataa agtgaagaca tagtgatttt 100680 
tcataaggat gctgtcttca cagaagatat taattaacac atgaaacaca cttaggacag 100740 
tactggatac atagtattta ctcaataaaa ggtacctgtt ttaattatta actggaaagt 100800 
tactttacct atcttagact taggttcctt tcataacctt ctatgttttc ttttaataca 100860 
tttattactc tttggatata attaatgtct gtgttctcta taatattgta agtgacatga 100920 
ggtcagaaac aatgatgttt tgagagcaat tattttctat ttttcagcct ggtatgtagt 100980 
atgtgatccc taaaaaatga tagatagatg attgatgggt agatgacaga tagatgatag 101040 
atagatagat agatagaaaa aaagaagtaa aaaaaataat ggagattata atattagttt 101100 
tgtaaggtaa tgggattttc ctgcaaaaca aaacactgat ggtaacttaa atgtgtatat 101160 
tttattcatg tataatcagt ctgcttttgt ctccttcaac agaattcatc tggaaatgtg 101220 
cagaataaat caaaccagag agaagaaatt tatcttcctg aggttggcca gttgcccggc 101280 
gttgcatctt gtcttctttc tcttggttct attaatttgt atttttcttt ttctggaaaa 101340 
cttggcccta aatctgcaca tcacatcctc acactgtctc cttgccaatc tctcaatcct 101400 
acaaattctg gttaactttt tatttggcaa gttcaattcc atagccaatt tatatgaaca 101460 
gcataacatg gtcacattcc cgttaaccca gtgttttgtc cctgcagcca tctgtggcaa 101520 
tcacatccac atggtcatct gccttccatg tagtcacaat attgggtcca aagtgctcca 101580 
cgtttatatg gctgtccttc tctatgtttg catttcatgt aaacttattt aacatggtgt 101640 
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tatccccaga 

ccaaataaaa 

atgagagttc 

cagaaaaact 

cttgaaggaa 

caaccaaata 

ttttatgtgt 

tgggtgtgtc 

taaaggagag 

agaatagaaa 

ttcgcctgct 

gacttaaact 

tttcctggtt 

atgagccagt 

tatatatata 

ctctggagaa 

ttttaaggat 

ttagatttaa 

cataaactgt 

ataagatata 

gattataacg 

gatgagcttg 

ggagatcctt 

cgttgggggt 

taagtgctat 

tccccgttcc 

ccagaataaa 

ttttggggcc 

caacagcccc 

gatgagtttt 

agcatatggt 

agctcattaa 

aagtttggtt 

atgtttaact 

attcaagagt 

aagcttttca 

ggaaaatcaa 

tttcaatggc 

aaatagaata 

tggtgttccc 

tgcaaaaaca 

tacagtccct 

gaaggggaga 

ggacctttta 

cttgacactg 

cagtaggggc 

gtccctgaac 

tactggcaga 

gaaataattt 

aagctttttg 

taggaatttc 

gatcaagaag 

taataaagtt 

atttagactt 

gttggaaagt 

aggtcacaaa 

tttaaaaaag 

ttcttgccct 

agcaccatgc 

ctataaatta 

ttggtactca 

tgagtaatga 

catgagtgga 



gctgacgaca 
ccaaatttca 
taactttata 
tctctgtttt 
tcaataacac 
aatagaaata 
caacttgact 
tgtgaatgtg 
tgttctcacc 
ggcaaaagaa 
cttggacaat 
atttgttccc 
ttccagcctg 
ttccattgtg 
tatatatata 
ccctgatgaa 
atattttctt 
agatgctaat 
ttatacatat 
aggaactatg 
atattgattg 
gggattttaa 
gtctcctgta 
ttgtgggaag 
ctgacagtga 
cagtggtatc 
tggtaaagac 
cacctttacc 
tgaaagaaga 
ctaatttata 
aaaagagtag 
acagagattc 
gtttggctga 
tcccttggtt 
ggacttgcca 
ccaatacatt 
aatgcagtaa 
ccaaggcaag 
gtctgaccca 
agttgtataa 
ttttaggtca 
caatcaattt 
caaggtcttc 
tctgggtaac 
gctctgaact 
ttatgatgat 
tcagcctgtg 
atccccatgt 
atttaaagtg 
ggtaattctc 
aaagggtaga 
aatagcttgg 
gaatagtaaa 
aattggatct 
taatccccac 
agttacaacc 
caattataaa 
ctgccttcct 
attttgactt 
cttagtctgt 
gaagtgtggc 
gtagaggcca 
ggtttaaggg 



tggaatagca 

ccacacgcca 

ttttccaaac 

tgaatttcac 

aaaacagtag 

gaaagaacgt 

gggcgacagg 

tgtctggaag 

aatgtgggct 

gagcagattt 

gttggtgctc 

ctaccaggtc 

caggcggcaa 

tgtgtgcgtg 

tatgtaaaat 

tacagatttt 

atctgtttct 

gactatttag 

atgcaaagta 

tgactctcta 

gttactctta 

tttccagtcc 

gcttcaatta 

atcctgatga 

aagaggtttc 

agcttttcca 

ttcccctgag 

acccttcttt 

ggtaccaagc 

caaacagaaa 

aatgaagata 

tccatttaat 

aacatgaatc 

taatgtagag 

ttgaagacct 

gataaataaa 

tagtaactag 

gtgggcatag 

cacgtatcta 

gtaaatagaa 

agtgaacaaa 

gtagacttga 

tcaaggaaga 

tgcattggag 

gacaatttca 

tagattatca 

gttatttctc 

tggtcctcct 

ttttgtgagc 

ccagagaagg 

ggatgaattg 

ttgtgcttat 

ctagatcaga 

acatgatgtg 

cgtaacagtg 

ttgtgggatt 

tgggcttgag 

ccatgggctg 

ctcggttacc 

gttattctgt 

tgttctatat 

aaagaatttg 

tgattatggt 
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tcatatcctg 
tgttgctact 
aaatttggaa 
cttatgaata 
ctgccctata 
aacgtttgaa 
gtgcacaaat 
agattttacc 
ggcatcatgc 
gttttctctg 
ctggacctcg 
ttcaaactta 
atcacaggac 
tgtgtgtgtg 
cttcattcta 
ggtattgaga 
gaggtttctg 
agtagtcgaa 
tccacattgg 
tatacttttg 
atgtcactgg 
aaccctgaga 
gagaattgca 
agctggaaag 
cctactccta 
cctatgtctt 
gcagtttcca 
gcttctaaac 
aagaccatga 
tccagggaac 
tagttggatc 
Ngttgcaactt 
aaaagatggt 
gaaggaattc 
actcacccat 
cttgtgaggg 
atcatggggt 
ttaccataat 
tgacattggc 
agcccactaa 
agtctattct 
ggcagtttac 
tttccatata 
aaaaggagat 
ggctcacaac 
gtgaattcca 
tggttctaaa 
acttgtggag 
acaaagaaag 
taaccactga 
caaagacttt 
gacacaaata 
taaggcaaga 
gtgatttgaa 
ttaaaaagta 
gataaaaagg 
gctgcaagtt 
acacagcatg 
agaactataa 
tgtagcagca 
cagctgaaca 
caggaacagt 
gagggccagg 



ccaccttgtg 

ccataatgtg 

tctttctact 

tgtctagata 

tgtttgtgac 

aatagtagca 

acttggtaaa 

atttgaattg 

aatctgctga 

ggtgagttgg 

gacctgcaga 

gactgaatta 

tttttggcct 

taatcatgta 

tatatagaat 

gtggttctag 

gaattggctc 

aaagcattga 

attatcctca 

tactttctta 

acaaagtgct 

gcttctatgt 

gaaagggatc 

actgagcccc 

tactcctagt 

gggtaattaa 

tacaagacaa 

ttataactaa 

ggtggtgcac 

atgtgtagga 

aggtctaatt 

ggggagttag 

ccactgtgag 

aaaggtctag 

actggaaggg 

gaaactgcaa 

ggcagaggcc 

gaacagcagg 

tagcacaggc 

attcttactt 

gaattataat 

agacccagaa 

ttgctaggtc 

aatcagactt 

atcattatag 

tctcaaagtg 

atgcataaat 

cttgggtaaa 

aaaaaaatta 

gatgggtttt 

tcctaaagag 

atagatagga 

atgaaatgct 

cgtatcctca 

gggcctaatg 

ctaaataaag 

ctctaacttg 

aaggccttta 

aacaggtaca 

taaaacagac 

tgtgaaagtg 

ctagaaaaag 

aaatgtaggg 



gctactccga 

atggaagtgt 

ctttgtaagc 

atgttacctc 

tctttttgaa 

tgatggttaa 

atgttatttc 

gaagactgag 

gggcctgaat 

gacatcaatc 

ctctgaccag 

aaccaccagc 

ttatagttgc 

atgttatata 

gcttctcttt 

agcaacataa 

tttaatttca 

tagtccatga 

tcagacactt 

gaaaactaag 

gaaagataag 

atgtctggaa 

ctatatgttg 

taaattctga 

ggaattggcc 

ctccatattg 

tacagattct 

acccaagtac 

taaaaaagta 

atggatacta 

tgtcaatatg 

aaaaagctct 

taaactggaa 

agaaattaaa 

ttcagaagac 

ccacacaatt 

aggtggtggc 

ttaaagcatc 

tagttaatta 

gatctgaata 

aactgatagt 

ctccttaaat 

ccacagagag 

ttgagtacta 

ccccttcagt 

ggcccagtgg 

agaatagaca 

agaaatagtt 

tcttgacaaa 

gaaagttgag 

tgaaccaact 
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acttcttaaa 

aggccattct 

aggccatctt 

tctgtggaat 

gaagcaaaaa 

agaaagaaaa 

actcttagcc 

tgttcatgta 

gaaaaattgt 

gtatgtttag 

catctccttg 

ccttagtcat 

gtccttcccg 

atagccagtc 

gcagtaggga 

tcgccatcgc 

aacttttgcc 

ttgctacaga 

ggatgaccct 

atccagagag 

gtatgtagta 

aactaatgga 

aaacagggtt 

cgtaataaat 

ttgtaatcaa 

catgttagtg 

tcactcaaca 

caatatataa 

cataaaaagt 
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gacttttttt 
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cagttctttc 

aagcaataag 

gccattttaa 
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tgttctccag 

gaaaacatat 
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ctttatatac 

atagttccta 

gcaaagttgg 
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gtgagaaata 

agtgctgtaa 

gaaatataat 

atcttctact 

gtaaacatta 

aggtctcaag 

accatagtcc 
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ttttatacag 
acagaactta 
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tggccatgta 
ggatacagta 
taagtacaaa 
cttccctgtt 
cccctagttt 
cctcagtcac 
ccgaaactac 
agaattagct 
ctagctgctt 
tccatccttg 
tgagtgctag 
gatgaacata 
gaaagcattt 
atacataaat 
caatcatata 
ctagcttgtg 
tctcagtaga 
ttaccagata 
tgtttaatca 
gctagagaga 
gctttttgaa 
tgggtctaat 
tctgtacatt 
aacatctgta 
tcatttgcac 
aagcaacaca 
tttttttcaa 
tgcattagtg 
tgccaagtaa 
taaagtcatt 
aatccatgtt 
cttctttgtg 
aaacctttcc 
gggctgtcta 
aagccaaata 
ggagtatggg 
actactgaac 
atgttgtcag 
acattttgaa 
agctaacatg 
gagttggttt 
taattttaat 
atgtaaggtt 
aaaaaaagat 
gcatgcacac 
tgtattatat 
atagtcaggg 
ttacatgaac 
agctagagta 
tagagaaaaa 
ggaggggagt 
aataaccggt 
agatttaatg 
tgccttccac 
cagatttcta 
ttgctgacat 
ttgaatataa 



gtggctatca 
caggtgaaac 
tggcaaataa 
agagtggtga 
gtgcacagct 
atttgtaatg 
aataataaaa 
aattcctctt 
gagttctgag 
ctttgttctc 
cagtaaacaa 
ctgctctgtc 
tcaccctgcc 
tagactgtgt 
taggataagc 
agatgcaccc 
tttcactttg 
aatagaagga 
caaagatctt 
atatgtttag 
ttttaacgat 
aaaacaggat 
actaaactgc 
ttgcttccgt 
tgaaattaga 
ggagattcat 
tttgagaaaa 
aatatttatt 
gaatttataa 
tgtggtacat 
tctagatttc 
aaataaacaa 
atgaatgtat 
gccaagtgaa 
tggagggagt 
accaagagat 
caacatcact 
atttcttttt 
ttatttatta 
tttgtgtgtg 
cttattcagt 
tggttatata 
aaatgggact 
atctttacat 
gtgagaaaac 
cagcatagtt 
tgggagatga 
gttttggaaa 
aggacatgaa 
gaagaaaaga 
tgaaataatt 
atgtctatgt 
gaagagagag 
agccttttag 
gaatttgaga 
tacagttttt 
agaattaaga 
gacaaatcct 
ttttaatttt 
agagttgcaa 
aatggcgtgg 
caggattatc 
tctctcacag 
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tcaaagtgct 
tgagaaacaa 
attggcagaa 
gctagactat 
tattttgggt 
aaaaggaagc 
aagcatgttt 
caaagtttag 
ttccacttca 
cattttaaag 
ccccctccta 
cttagttatc 
actctggctc 
ggtccaaccc 
acctcttccc 
ttctatagaa 
tagcaccgaa 
atccaggttg 
gtggaaaagc 
tttttaactg 
gtggaaactg 
ttctcgtctc 
ctttccttgg 
gtacctggaa 
tgaaataaca 
agtcaaagct 
ttgaattttc 
tcatattttt 
aaacttttta 
catggcctag 
ttcaattata 
agaaaaagaa 
gctcaccgta 
aagagtgagt 
gaggtagaac 
gtgaaaaaga 
gagctgtctt 
aatgcaacag 
tctctgaagg 
tgtgttagga 
agaaatgatg 
ggtatcaatg 
aagaaagtaa 
agattataac 
taaagctcaa 
gtaatttttt 
ataccagtac 
gtacaaatgt 
tatatagaag 
ttgcaaagca 
aaaaactaca 
gtatacatac 
agaggttaaa 
ttcgtcaaag 
ttaggagacc 
gaggtcattt 
gaaaccaggg 
ctatgattag 
aatttttatt 
tattatctgt 
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taagcattat 
attaatttaa 
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tgtactggaa 
ttgtgtccat 
ctggaagaag 
gcttacagta 
agcatgaagc 
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cctgttaact 
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tttaacctcc 
gcctctatca 
tagtcatcta 
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ctcccttgta 
gtaaattgcc 
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gatgcccagt 
ggacacagaa 
cagggaacat 
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ttcatatgtt 

ttaagagtat 

acagttaaac 

tatcagctct 

ataaagagtt 

ccagtgagag 

aagtagctta 

aaatctaatt 

ggctttgttt 

agttctggct 

gcaggctcag 

tctaatcact 

gtcatttcca 

tgaaattaac 

tggtatcctc 

tagcatattc 

ctatccttat 

aatcagaaaa 

agctaagtta 

ctagtggcaa 

agacacagta 

cctgtttcaa 

tgcagaggag 

aatatctact 

cattggtgac 

tcttcacgct 

ccctcatcca 

tcctttgtgg 

tcagagaaga 

ttggtccacg 

tgcaaccctc 

gtgctttatg 

gccttctgtg 

ctggcttgtt 

ttcacttatc 

aggatctgct 

gccgttacta 

tccatggagc 

cccatgatct 

aaaagaaaat 

ttaccctatg 

tcctagaagt 

tcaattaaca 

gctctatagt 

aaatccttaa 

ctgcttctta 

caaaaagaaa 

cacagcttac 

cctctgccat 

ttttttttgt 

ccagtaatct 

cccagcctat 

tgtgtcttac 

catcccttgt 

ccagccaaaa 

catacctggt 

ttgaatgtta 

gcatgtctaa 

cacagatttg 

ggcctatagt 

agagactaat 

ttcggattat 

aggatgaaac 



ttcatcttat 

ttatattagt 

aaattagaag 

gccaaggttt 

actctcatct 

gcagaaaaga 

attttctttt 

ttagaatttt 

tttaaaattt 

ctgaggagca 

gctctgtcaa 

tcgctatgca 

tgctctaatt 

tagaagcatg 

ctcaaatatt 

tgtttgtcac 

aacaggaacc 

agaatgtcag 

tctaaacaat 

aatctgagca 

gaaatgaata 

ttaaatcact 

caatgcaaaa 

aatccaacat 

tgagttcatt 

gtttctcacc 

ggccaacgcc 

atctgtgctt 

aaagcatttc 

ttgagctcta 

tgctttatgg 

tgtatggagc 

gccccagtga 

ctgacaccta 

ctctccttat 

ctacagaagg 

ttttctattc 

aggggaaaat 

acagtctgag 

tgttttctaa 

atttttcata 

gttggggaat 

ggaattttga 

gaagaatcaa 

aaccagacct 

aacatatatt 

acagcatctc 

tgcaatcttg 

agtagttggg 

agagacggga 

acctgcctgg 

aataagcttt 

aaagaaagtc 

acactcagct 

acacaccaac 

ttatttcttg 

cctaaaagtc 

aaccacatca 

catgggatca 

tcccctagag 

gatgaggtgt 

tgaagggcta 

tagtccgttt 



atagcagtca 

tagataacgg 

tttatttctc 

taggagctcc 

atatgattca 

agaagggaag 

attacattct 

aaaaaaagga 

atttaactta 

aatacttaac 

atctgtctgt 

tgctcagtat 

caggtctgtc 

ttagtagaca 

tatcttttat 

ataaagaatg 

taagggattt 

gtaccaggtt 

taatttacaa 

ataaagatgt 

tgagtaagaa 

atctctagtg 

caatgtttca 

tctcatttcc 

ctcctgggac 

atttacatgg 

ccggctccac 

ctcttccaat 

ctatcctgcc 

catcctggct 

cagcagaatg 

actcactggc 

aattaatcac 

caacaaggag 

catcctcatt 

caggcacaaa 

agctcttttc 

ggtagctgta 

gaacaaagat 

ataaacatta 

gagcatagtc 

ttttatgaag 

attcaatata 

tgtaaatttt 

gggttttctt 

aagctttttt 

actccatcac 

atctctggtg 

actacaggtg 

tttcaccatg 

gcttcccaaa 

aagaaaaata 

aacattccac 

gtcatctcag 

actttggtcc 

gctttttaga 

tgtctaagga 

acccaaaagg 

ctgtcttcca 

gtaacctaga 

gtgcaaggtg 

caacctctag 

agacttggtt 
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agtttacaat 

attaaattcc 

atttatgtaa 

aactaattta 

ggatggctca 

ggcaaacctg 

cttggccaga 

aattctggtc 

tttattttta 

ctggttaagg 

tgtctggatg 

gcacaaatat 

gctccaaaaa 

ttttgaaaat 

tttcccccaa 

agctaaaatg 

tatctgacat 

ctcaacttac 

aagtgctgag 

gagataaaaa 

attaagtgac 

tctcctgaca 

agaaaatata 

cactgaagga 

tggccaatca 

tcacggtggc 

acgcccatgt 

gtgactccaa 

cgtcttgtgc 

gtgatggcct 

tccaagagcg 

ctgatggaga 

ttctactgtg 

gtgtcaatgt 

tcctatctct 

gctttttcta 

ttcatgtatc 

ttttatacca 

gtgaaagagg 

ctactgattt 

tgatacaaat 

tgtgaataca 

gttttgcctg 

aaaaaatcta 

tttgcaagga 

taaaaaaaac 

ccaggatgga 

ctcaaatcat 

tacaccatga 

ctgcccagac 

atttaggaat 

gcagttctgc 

tttaccatat 

ctgttgcctc 

acagtgaggc 

ggaaaaataa 

gatccaaggc 

tgtaacatgt 

gtcttgaaga 

acttaagata 

agaaattaga 

agtaggacag 

tcatggagga 



aatcttatta 

tatgagaaca 

aattctgagc 

ctcttctttc 

gcactatatc 

ttttatcaat 

agctaccctt 

atttttgttt 

agagctgtcc 

gaaaggtact 

tctggatgat 

agacagacat 

attacctact 

ttttctgtaa 

aagtagcagc 

tatttttagt 

ggcatggaag 

agttaagtta 

tattcatttg 

agcaaaatgt 

ataaaccgaa 

caaaacaaat 

aaaatgtttg 

aattatgaga 

ccgggaatta 

aggaaatctt 

actttttcct 

ggatgctgga 

agtgttacct 

ttgaccggta 

tgtgctcttt 

ctatgtggac 

tggacccacc 

ttgttgtggc 

acatatttcc 

cctgtggctc 

tcagacgtcc 

ctgtaatccc 

cattatgcaa 

ttgttgtgtt 

atatccaaaa 

agaaaaaatg 

cccatacccc 

atttatactt 

gttaggcata 

aacaacaaca 

cctgagtgca 

cctcccacct 

tacctgacta 

tggtcttgaa 

tacaggtgta 

taaactctac 

tcaacctaaa 

tgcctgtacc 

acacttatgt 

atggctagta 

taacagacaa 

ggttgtaatg 

tgtgtgtttt 

taatcacata 

acacagatgc 

agtcaacaac 

gttaaaaagt 



agcaccaacc 

aaacaaaata 

tagtatgtca 

tctgctttct 

cacattacag 

ggatggataa 

ttgccatagc 

tgttttcttt 

cataattatc 

agcttctagt 

cacttagcca 
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ctgaagttta 
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tagtctaaat 

tttttactac 

aattaactgt 
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aaaaaaaaat 
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gattttcctt 
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agaactgttc 
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aaagtaagaa 

tagagaaaaa 
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gcggtgtgat 
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atttttgtgt 

ttgctgagct 

agccactctg 

taaatacttt 

atatgagcct 

aacctcatct 

ttccacagtc 

gacttctaaa 

tctgttcaat 
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aatttatgac aataattaat gctcatgtta agtcagcatt tatatttata ctgctttgat 113040 
gggctatcaa atctcaagct aaataatgaa gaacataaag aggcccaggt tagaagaagc 113100 
ttcatgacac tggaataagc aaacataaac ataagtcaga gaatgggtta tcaaccactg 113160 
agacttcaaa gatagcccac tgaaaaagtt ctaattcaca gaaattttcc aggaaaataa 113220 
gaaataaaaa gcacaaatga atatctttaa tttatttcag atattaggat tacttgatgc 113280 
aaattgtaaa ataaagatat ttaatcattg aaaataaatg tgaaaattta aactctaaga 113340 
ccatgtatat atgtgtgtac atatgtatgt gtacatatat atatgcaaag actgattata 113400 
gaaaccaata attatatata atatgtagag ttccaagcat atctgcaggc tgaaggagag 113460 
tctctgtgtt ttataggcat tgaagcattt ttagagtgga taggcaggta tataaatttt 113520 
cacattgaaa ggaagaggaa gaaacagcta agtccctctc atgaggcttt aagtggtctt 113580 
ggagagtcag cctttctctt cttcttcctt agtaaatttc atgcagacct tggcagccta 113640 
atcactacaa tcctgcatca gtgagtcctc tgaactggca ccatagttca ctaggaaagg 113700 
cactcatcca ccaagttcag agaggtggcc atccgcattg ctaaacagaa taaaacaaaa 113760 
attcactttg tcatctgcct gacattggcc ttgctataag atagcctcat aaactgcctg 113820 
aaaggcctta ttcttttcaa aggattttgt gctgtcttca agtgatattt ttctccatgt 113 880 
cagaaataaa atatttcagg attgacctgc ctcaaatttt agtttgtctt ggttattggc 113940 
cattgcatgt ataagtctaa ttgtgtgaca gaagtttccg atactctgct tcatgaagta 114000 
ccctttagga ctatctcctt gtctcttcag ttcttcaatc aatttttcct gaagttcttg 114 060 
tgtgaggctc tgatacaaag agtaggcagg tgctggcatc cagcccagag ccaccctttc 114120 
cagccctagc acatccttaa agcacccctg accagcaccc tcaacctggc caacaggttt 114180 
ttcagcatct acaggttaac ctccatgggc ttgagctgca ccttcacaaa tcacaagaga 114240 
aaccacaaca cagagtacag ggaaaacagc ttttcaaaat tggtaaggtt agtagtggtg 114300 
ttaagatatc accctgtctt cctatttgaa taccctattt tttttttgcc tgattgccct 114360 
ggccagatct tccaatactg tgtttaatag gagtggtgag agagggcatc cttgtcttgt 114420 
gtcacttttc aaagggaatg tttccagctt ttgcccattc agtatgatat tggctgtgag 114480 
tttgtcataa atagctctta ttattttgag atacattcca tcaataccta gtttattgag 114540 
agttttcagc atgaaggggt gttgaatttc atagaacgcc tttttttctt catctattga 114600 
gataatcatg tggattttgt cattcgttct gtttatgtga tggattatgt ttattgactt 114660 
gggtatgtta agttagcctt gcatcccagc gataaagcca acttgatcgt ggtggataag 11472 0 
ctttttgatg tgctgttgga tttggtttgc cagtatttta ttgaggatat ttgcatcgat 114780 
gttcatcagg gatactggcc tgaaattttc tttttttgtt gtgtctctgc caggttttga 114840 
tatcaggatg acgctggcct cataaaatga gttagggagg agtccctctt tttctgttat 114 900 
ttggaatagt ttcagaagga atggtaccag ctcctgtttg tacctctggt agaattcggc 114 960 
tgtgattcca tctggtcctg ggtttttttt gattggtagg ctattaatta ttgccacaat 115020 
ttcagaactt gttattgatc tatccaggga tttggcttct tccttgtttt ggagatgaca 115080 
tgactgtata tttagaaaac cccattgtct cagccccaaa tctccttaag ctgatgaaca 115140 
tcttcagcaa agtcttagga tacaaaatca atgtgcaaaa atcacaagca ttcctataca 115200 
ccaataacag acaaacagag agacaaatca tgagtgaatt cccattcaca attgctacaa 115260 
agagaataaa atgcctagga atccaactta caagggatgt gaaggacctc ttcaaggaga 115320 
actacaaacc accactcaag gaaataagag aggacacaaa cagatggaaa aacattccat 115380 
gctcatggat aggaagaatc aatatcatga aaatggtcaa actgcccaaa gtaatttatt 115440 
gattcaatgc tatccctgtc aagctaccat tgactttctt cacagaatta gaaaaatctc 115500 
ttgtaaattt catatgaaac tgaaaaagaa cccatatagc caagacaatc ctaagcaaaa 115560 
agaactaagc tggaagcatc atactacctg acttcaaact atactacaag gctacagtaa 115620 
caaaaacagc atgatactgg taccaaaaca catatataga ccaatggaac agaacagagg 115680 
cctcagaaat gatgccacac atctacaacc atctgatctt tgacaaacct ggcaaaagca 115740 
agcaatagga taaaggattc cctgtttaat aaatgatgtt gggaaaactg gctagccata 115800 
tgcagaaaac tgaagctgga ccccttcctt atgccttaga caaaaattaa ctcaagaggg 115860 
ataaaagact taaacgtaag acctaaagcc acaaaaaccc tagaagaaaa cctaggcaat 115920 
accattcagg acataggaac aggcaaagac ttcatgacta aaacatcaaa agcaatggca 115980 
acagaagcca aaactggcaa atgggatctc actaaactaa agagcttctg cacagcaaag 116040 
gaaactatca tcagagtgaa caggcaacct acaaaatggg agaaaatttt tgcaatctat 116100 
ccatccagaa tctataagga acttaaacaa atttacagga aataaacaaa cgaccccatc 116160 
aaaaagtggg caaaagatat gaacagacac ttcacagaag aagacattta tgtgggcaat 116220 
aaacatgtga aaaaaacctc atcatcactg gtcattagag aaatgcacat caaaaaaaca 116280 
caatgagata ccatctcacg ccagttagaa tggcgatcat gaaaaagtca ggaaacaaca 116340 
gatgctggag aggacgtgga gaaacaggaa tgcttttaca ttgttagtgg gagtataaat 116400 
tagttcaacc attgtggaag acagtgtggt gattcctcga ggatctagaa ccggaaatac 116460 
catttgaccc aacaatccca ttactgggtt acatacccaa aagattataa atattctact 116520 
ataaagacac atgcacatgt atgttcattg cagcactatt cacaatagca aagacttgaa 116580 
accaacacaa atgcccatca accatagact ggataaagaa aatatggcac atatacacca 116640 
tggcacacta tgcagacata aaagaggatg agttcatgtc ctatgcaggg aaatggatga 116700 
acctggaaac catcattctc agcaaactaa cacaggaaca gaaaacaaaa cactgcatgt 116760 
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tgttctcact tataagtggg tgtgaacaat gagaacacat ggacacagga aggggaatat 1X6820 
cacacaccag ggcttgtcag gggttcgggg gctagggaag ggatagcact aggagaaata 116880 
cttaatgtag attacaggtt gatgggtgca gcaaaccacc atagcacata tatacctatg 116940 
taacaaacct gcaggttctg cacatgtatc ccagaactta aagtgtaata aaacaaatgt 117000 
atgtcacctt gttaaaaaac agatagtaga aaaaccagca aaatgagaac tggtgctata 117060 
taaaaatgta ttcaaagaac caagtaacat atgactttgt gctcaacttt aataatcatt 117120 
agggaagtac acattacaat tacaagatat ttgtgttttt atatatctgt acctatatgt 117180 
tgctaaatga caaaaagaaa aaaatgatat tgctaagtac tggtgggact atggaacaat 117240 
tgaatctctc acataatgtt agtgagaata caaatggata caaccagttt ggaaaactat 1173 00 
taggcagcat aaatttaagc tggacatatt tatactccat aagccagcac ttctactcat 117360 
agttacagca gtcagacatg catgaatatg ttcatgaaga gatatgcaca agactgaatt 11742 0 
ttgtaatagc cacatctgga aataacctat atataaagtt tggtgtactt attatacagc 117480 
agtgcagatg aaggaattat ttacatattc atcacactca tcatcatgaa taaaaatcac 117540 
caacataata tttagcacca atgtagtgtt cattggagac tagacctaac aaaatatgca 117600 
ctgtataatt tctattacag aaaattcaaa accagacaca attaatctac agtgttaaaa 117660 
gcaaagatat agttaatttg aggttactga ctaggaaagg gcatgaagac agatggtgga 117720 
gtactggtaa tcttctattt gagaatttag aagcaagtaa tacaagtatg ttcactttgt 1177 80 
aaaaattcat gaagcaggcc gggcacggtg gctcgtgcct gtaatcccag gactttggga 117840 
ggccgaggca ggtgaatcac gaggtcagga ggttgagacc atcctggcca gcacagtgaa 117900 
accctgtctc tactaaaaat acaaaaaatt agccaggcgt ggtggcgggc acccgtagtc 117960 
ccagctacta gggaggctga ggcaggagaa tggcatgaac ccggaggcgg agcttgcagt 118020 
gagccgagat cccaccactg cactccagcc tgggcaacag agcgacactc tgactcaaaa 118080 
aaaaaaaaaa aatcataatt catgaagctg tacacttacg ttttgtgctc attatttgtg 11814 0 
tgtaaattag acttcaatat aaagcttact aaaaacgaat aaaaatagta ctagtcttca 118200 
agcaagcaaa gcttcattcc aatatcaaag cattctattt acctatcagt acacagaggg 118260 
tattagtttg ctagggctgc cacaaataag taccatgaac ttggtgactt aaacatgcag 11832 0 
atttatttcc tcacagttct agaggctaga agtccaagat caaggtgtgg gcaaaattgg 1183 80 
tttcattctg agttctcttt ctggcttgtg gatgatcatc ttatcccgac ctctttacac 118440 
tttatttttc tgtgtgtatc tgtattctaa tctcttctta taaggatgca agttatattg 118500 
gattagggca cagctcaccc attaggcttt attttactaa atgttctctt tagatattgt 118560 
gtctccaaca gtcatgctct gtggtgcttg gagttagaac ttcagcatat gaattttgga 11862 0 
gagggaggga aggggcacaa tccagtccat aacacagatt aagaaacgtg aaaggctaat 118680 
agaagtttga cacaaagttt gtgacactag tacgagagaa actgtatcag aaaagttgaa 118740 
ttaagttgaa agtaacatgg taaacctaag gcaatgtgag aatccatggc agacatgaat 118800 
gttatcttat ggattttcca atgtaagaag gaaaatactg agaatgaaac ataaggcaga 118860 
gaaggaccag agagttgtga gttcccattt taaatttgtg ttgtgccaaa tgtcatatct 118920 
ctagagaaat tattcagtga gaaaaaaaat ctgacagagt aattgcttca tttttgcata 118980 
tctgtgaaat cccttaggga aataaagtca tcatacaaat attataaatt attcctgtat 11904 0 
ttgtcaccag aaaagccatt tgatattctt tgtaaggata gctcttccct tattcataaa 119100 
taagtttctg catgtgtttg taatcctgga acacttgttg tacaatcata tgtattttca 119160 
gagttagata tatgattgtg atgattaaat gactaggtag aaagaaaaat gccaattacc 119220 
agaaaaatgt agacacttag catttaaggt acttttattt gttaaagtct tgaataatga 119280 
ggatggaagt taatggcata aaaatataag aggcatgctc taggatcttt cactcaatat 119340 
aaatgaaagc taatatttat taagggtttg ccacacattg ggcacagtgc tatgcatatc 1194 00 
acatacccca ttttgtgaaa tccgaaaaag agtgcttttg tatttgctaa tttcatctgt 119460 
aatagaaaaa aattatagtc cagaattatt aagaaacatg acccagacta ctcagatcag 119520 
aagttctgac atcagaatgt gaactcaacc agtcgactcc caaacgtatg tttctaccag 119580 
tacagtatgc tttatggttt gtagtggaat ttctttctgt actaaccatg agggaaatat 11964 0 
gttattatcc atatctatta taggcaaaat gtcatagaat tggtttgagg gttgaatgtg 119700 
ttaaaactta taaaatagat tagtgtttgg cttataagaa acaccatgta agtgctggtt 119760 
aaattagtgg taaaactaaa acacagaata aggaacatgt caaaagaaca gagcagcatt 119820 
tcagaaatat ctaactccag atcctgtgaa ttgattttat gctaaggcta tcatattttt 119880 
atcaaggcat tcatctttgt gtttggtcta gtcctaaaac ttaagaatgt caaccggatg 119940 
tgtgcatttg tcaaatacac aaagttgggc actttaaata tatgaatttc actgtatata 120000 
aattattgca taaacaacat taaacattaa acaaagaaac aaaggtgaaa tctgacagga 120060 
gcttgatata taaaaatgaa tgaaggtatg tagggaaaag aaagagagat cagactgtta 120120 
ctgtgtctgt gtagaaagga aagacataag agactccatt ttgaaaaaga cctgtacttt 120180 
aaacaattgc tttgctgaga tgttgttaat ttgtagcttt gccccagcca ctttgaccca 120240 
accactttga cccaacctgg agctcacaaa aacatgtgtt gtatgaaatc aaggtttaag 120300 
tgatctaggg ctgtgcagga catgccttgt taacaaaatg tttacaagca gtatacattg 120360 
gtaaaagtca tcgccattct ctagtcttga taaaccagga gcacaatgca ctgtggaaag 12042 0 
ccgcagggac ctctgccctt gaaagcggag tattgtccaa ggtttctccc catgtgatag 120480 
tctgaaatat ggtctgaaac caggggcaca ttgcactgcg gaaagccgta gggacctctg 12 054 0 
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cccttgaaag cggggtattg tccaaggttt ctccccatgt gatagtctga aatatggcct 120600 
tgtggcatga gaaagacctg accatccccc agcccgacac ccgtaaaggg tctgtgatga 120660 
ggaggattag taaaagagga aagcctcttg cagttgagat agaggaaggc cactgtctcc 120720 
tgcctgcccc fcggggactga atgtctcggt ataaaacccg atagtacatt tgttcaattc 120780 
tgagatcaga taaaaactgc cctatggtgg gaggtgagac acgtttgcag caatgctgcc 120840 
ttgttattct ttactccact gagatgtttg ggtggagaga aacataaatc tggcttatgt 120900 
gcatgtccag tcatagtacc ttcccttgaa cttaattatg acatagattc tattgctcac 120960 
atgtttgttg ctgaccttct tattatcacc ctgccctcct actacattcc tttttgctga 121020 
aataatgaag ataataatta ataaaaactg agggaactca gaggctggtg caggtccttg 121080 
atatgctgag tgccggtccc ctgggcccat tgttgtttct ctgtactttg tctctgtgtc 121140 
ttatttcttt tctcagtctt tcatcccacc caactagaaa tacccacagg tgtggagggg 121200 
caggccaccc cttcaaaggt acatcactac tatgattgaa tagatgtaga cacagctttt 121260 
actcgatgtt gtataaatca caaataaaat ctttgtttca gatattcaaa aataaccttc 121320 
tataagatgt tcttgtgaat atagatgatt tttatataga aaaatagaag gtacatttca 121380 
ttaatatcag tgttaacagt ttttgttgtt gttgtaagta acagaacaca gctctggtta 121440 
tgctaagcaa aaccggaaca ttctgaaagg atgttagggc tcctaaatca atgtgcggtt 121500 
aaaagaccag gcttagagaa cagatagcag cctaggcagg tgtggaggct gagcagatca 121560 
aaccacacag aatcagtgtt tctttgcaac atcagcctga tcttcaaacc tcactgtaat 121620 
gaattagatg aaaaaaaatt ttgtcaaact tgagtcatta actcactccc ttaagacaag 121680 
gagaggctag catctgaccc ctgcttaagc ctttctaaac agagaatctc atctctcagt 121740 
ggaaactaga agtggaagat caagaggaca gaatttccat gactctcaac ttgacacatt 121800 
ggaaaatttc ctccaaaaaa ttcaagtgga gtgttaatgg ctggaattta atagctgtgc 121860 
aaccctcgaa acgcaaatat gtagatattt ttcaaaataa ggggaacaaa tcatgccctg 121920 
cctggaaaat ctctattgca ttgagtctgg aggaagaaga aagcaacaag tgagccccgc 121980 
ttgtttgctc aggcttctcc aagaggggat tgaactggtg aaatctattc agaaataaca 122040 
agatcattca gtcaatattg ttctttgata tcttgttctt tcctctctct aaactcattc 122100 
tctgccagaa ttataatcag gaataaagaa attggttagt tttaaatgtt atgttttggg 122160 
gcaagtccac ctataaagta catttctcaa atataaagag gggcagaata agagaaggag 12222 0 
aaagaagaga agaagaatag agacagaagc tgaattagaa gggaatgaaa agaaagacaa 122280 
agggacataa aacaagaaaa ataaagggca gtatcaaaaa aggcagagag ctgctttgaa 12234 0 
tatccctact ttagtttacc tgaaacaaag ttcattagaa ataaaatgag gaaaactgct 122400 
gaaaagaggg gaaaactttt ggtaagaaga ttattactct actggtaata atagtaataa 122460 
tcatgttgct attgtattgc aatcaatata tatgtaaagt tcctttctcc cactgaagga 122520 
aattatgaga aggaacttca cgttggtgac tgagttcatt ctcctgggac tgacgaatca 122580 
ccaggaatta cagattctcc tcttcatgct gtttctggcc atttacatgg tcacagtggc 122640 
agggaatctt agcatgattg ccctcatcca ggccaatgcc cggctccaca cgcccatgta 122700 
ctttttcctg agccacttat ccttcctgga tctgtgcttc tcttccaatg tgaccccaaa 122760 
gatgctggag attttccttt cagagaagaa aagcatttcc tatcctgcct gtcttgttca 122820 
gtgttacctt tatatcatct tggtacacgt tgagatctac atcctggctg tgatggcctt 122880 
tgactagtac atggccatct gaaaccctct gctttatggc agcaaaatgt ccaaaagtgt 12294 0 
gtgttccttc ctcatcacgg tgccttatgt gtatggagcg ctcactggcc tgatggagac 123000 
catgtggacc tacaacctag ccttctgtgg ccccaacgaa attaatcact tctactgtgc 123060 
agacccacca ctgattaagc tggcttgttc tgacacctac aacaaggagt tgtcaatgtt 123120 
tgttgtggct ggctggaatc tttcgttttc tctcttcatc atatttattt cctactttta 123180 
catttttcct gctatcttaa ggattcgctc tacagagggc aggcaaaaag ctttttctac 123240 
ctgtggctcc catctgacag ctgttactat tttctatgca actctgttct tcatgtgtct 123300 
cagacctcca tcagaagagt ccatggagca aggacaaatg gtagctgtac tttataccac 123360 
tgtgatcccc atgttaatcc catgatctac agtctgagga acaaggatgt gaaaaaggct 123420 
ttatccaaag aactgttcaa aagaaaattg tttcctaaat aaacatcagt attgattttt 123480 
gtcatgctgt cattttattt agcctataat tttttcatag agcttagtgc aatacaaatt 123540 
tatccaaaat tatacatttt ccttgaaggg ttagggaatt tttatgaagt gtgaataaaa 123600 
gaaatatgag tatttacatc aattaacaga tagtttgata tcaatatagt tttaactgcc 123660 
cataccccaa agtaaaaatt tctatagtga ggaatcaatg taaataaaaa aaaatctaat 123720 
ttgtatttta gaaaataaaa cccttaaaac cagacctggg ttttcttttt ggaaggagtt 123780 
aggcatatag ttttaaactg cttcttaaac atatattaag ctattttttt tttttgaaac 123840 
agcatctcac tccatcaccc aggatggact ggagtgcagt ggcatgatca cagcttacag 123900 
caaccttgat ctcagggttt caaaccatcc tcccacttac acccctcacc cctgccacag 123960 
tagctgggac tacaggtgtg caccaccata ccacactaat ttttgtattt tttgtagaga 124020 
cggggtttca tcatgttgcc cagactggtc ttgaactgct gagtgcaagt gatctatctg 124080 
cctgggcctc caaaagagtt tgcattacat gggtgagcca ctgtgcccag cctgtaataa 124140 
ggtttaaaaa agagcacttc tgctaaactc tactagatac ttttgcctat cttacagagg 124200 
aagtcagcat tccactttac catgtccaac ctaaaacgta agcctcatcc cttgcacact 124260 
cagttgtcat ctcagctgtt tcctctgcct gtaccaacct catctccagc ctaaaacaca 124320 
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ccaatgcttt 

tcttggcttt 

aagtctgtct 

catcaaccca 

gtggaggcct 

aagacgagag 

aggaaattcg 

acaacaagga 

tttaagaatt 

ctttgatggg 

cacacctgaa 

tgagaccatc 

gggcgtggtg 

ctgaacttgg 

gtgacagaag 

aaaacaaaaa 

atgacactgg 

cttcaaagat 

ataaaaagca 

ttataaaata 

atatatgtgt 

aataattata 

tgttttatag 

ttatttattt 

gcagggtcat 

ctggttttcc 

gagattaggg 

aagcacatct 

gagagcacgg 

tcttagtaca 

caatctgatt 

tcgtcatcat 

cgggcagagg 

ccgggcgggg 

ggccgggtag 

tgccccccac 

tggacagggc 

cggaggggct 

cagatgtggc 

agacggtcct 

acggggtcgc 

ggctcctcac 

ggtggcggct 

gaggtggagg 

gcactgagtg 

atcactcgcg 

tacaaaaacc 

gcaggagaat 

cggctgggca 

gtggagggag 

tttagagtgg 

aagtccctct 

ttcctcctca 

tgagtcctct 

gaggtggcca 

acattggcct 

ggattttgtg 

ttgacccacc 

tgtgtgacag 

ctcttcagtt 

gtaggcagct 

gcacctctga 

tccatgggct 



ggtccacagt 

ttagaggaaa 

aaggagaccc 

aaaggtataa 

atagttcccc 

actaatgatg 

gattattgaa 

tgaaactagt 

tatgacaata 

ctatcaaatc 

atcccagcac 

tgcctaacac 

gcgggcgccg 

gaggcgcagc 

gagactctgt 

caagcaaaca 

aataagcaaa 

atcccgcaga 

caaatgaata 

aagatattta 

gtatatatat 

tataatatgt 

gcattgaagc 

ttttagtatt 

aggacaatag 

taggcagagg 

agtggtgatg 

tgcaccgccc 

ggttgggggt 

gaacaaaatg 

tctctttctt 

ggcccattct 

ggctcctcac 

tggctgctgg 

gggctgcccc 

ctccctcccg 

ggctgctggg 

cctcacttcc 

ggcggccatg 

cacctcccag 

ggccaagcag 

atcccagacg 

gggcagaggc 

ttgtagcaag 

agtgagactc 

gtcaggagct 

agtcaggtga 

caggcaggga 

tcagagggat 

agggagaggg 

ataggtaggt 

catgagactg 

gtaaattttc 

gaactggcac 

tccatgttgc 

tgctatgcaa 

ctgtctacaa 

tcaaatttcg 

aagttcccaa 

cttcaatcaa 

gctggcatcc 

ccagcaacct 

tgatcagcac 



gaggcacact 

aataaatgac 

aaggctaaca 

catgtggttg 

tagaggtaac 

aggtgtgtgc 

gggctgcaac 

ccgtttaggc 

gattaacggt 

tgaagctaaa 

tttgggaggc 

ggtgaaaccc 

gtagtcttag 

ttgcagtgag 

ctcaaaacaa 

aaaaacaaaa 

tataaacata 

taagtttcta 

ccttcaactt 

atcattgaaa 

gtatgtgtac 

agagttccga 

attttacatt 

tattgatcat 

tggagggaag 

accctgcggc 

actcttaacg 

ttaatccatt 

aaggttatag 

gagtctccta 

ttccccacat 

caatgagctg 

ttcccagacg 

gcgggggctg 

ccacctccct 

gacggggcag 

tggagacacg 

cagacgtggc 

cggaggagct 

acggggtggc 

aggcgctcct 

atgggcggcc 

tgcaatctgg 

ccgagatgac 

cgtctgcaat 

ggagaccagc 

gcggggcatg 

ggttgcagtg 

accgtggaga 

agaccgtgga 

atataaattt 

tcttggagag 

tacagacctt 

catagttcac 

taaacagaat 

tagcatcata 

gtggtatttt 

gtttgtcttg 

tgctctgctt 

ctttttcctg 

agcccagagc 

caatctggcc 

cttcacaaat 
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tatgtttcca 
cagtaggatt 
gataatctgt 
taatgatctt 
ctagaactta 
aaggtgagaa 
ctctagagta 
tttgtttcat 
catattaatt 
aaaacatgaa 
ccaggcggtt 
cgtctctact 
ctactcagga 
tggagattga 
acataaacaa 
cataaagagg 
agtcagagaa 
attgacagac 
attacagata 
taagtgtgaa 
atatatatgc 
gcatatctgc 
tttttttatg 
tcttgggtgt 
gtcagcagat 
cttccgcagt 
agcatgctgc 
taaccctgag 
attaacagca 
tgtctacttc 
ttccccctta 
ttgggtacac 

ggggggccgg 

ccccccacct 
cccggacggg 
ctggccgggc 
cctcacttcc 
ggctgccggg 
cctgacttct 
ggtcgggcag 
cacttcccag 
aggcagagac 
gcactttggg 
gccactgeac 
cccggcacct 
ccggccaaca 
cctgcaatcc 
agccgagatg 
gagagggaga 
aggagaggga 
tcacattgaa 
ggttacagca 
ggcagcctaa 
tgggaaaggc 
aaaacaaaaa 
aactgcctga 
tctccacctc 
gttatcggcc 
catgaagtac 
aagttctcgt 
cacctcttcc 
aacacttttt 
cagaaaagaa 



cagtccatat 

ctaaattgaa 

tcaatgcatg 

gaagatgtgt 

agatataatc 

attagaacag 

ggacagagtc 

ggaggagtta 

cagcgtttat 

ataggcccgg 

ggatcacgag 

aagagtacaa 

gactgaggca 

gccactgeac 

aaacaaaaac 

cccaggttag 

tgaattatca 

attttccagg 

ttaggattac 

aattaaaatt 

aaagactgat 

aggecgaagg 

ttctcagcac 

ttctcgaaga 

aaacatgtga 

gtttgtgtcc 

cttcaagcat 

tggacacagc 

tcccaaggca 

tttctacaca 

tctatttgac 

ctcccagacg 

geagaggege 

ccctcccgga 

gcagctggcc 

aggggctgac 

eggatgggge 

cggaggggct 

caggcagggc 

agacactcct 

actgggtggc 

gctcctcact 

aggecaagge 

tccagcctgg 

eggggggctg 

cggcgaaacc 

ccggcactcg 

geggcagtae 

gggagaggaa 

gagggagagc 

aggaagagga 

gagaggctgc 

tctctacagt 

attcatccat 

ttcactttgt 

aaggtcttat 

agaaataaaa 

actgettata 

cctttaggac 

gtggggatct 

aggeccagga 

tcagcatata 

accacaaccc 



ctggtttatt 
tgttacctaa 
tctaaaacca 
gtttttgggt 
acatacaagg 
agatgccctc 
aacaacagca 
aaaagtctta 
atttacactg 
cgtggtggct 
gtcaggagat 
aaaatcagca 
gaagaatggc 
tccagcctgg 
aagcaaacaa 
aagaaacctc 
actactcaga 
aaaataagaa 
ttgatgcaaa 
cgaagaccat 
tatataaagc 
agagactctg 
ctttatttat 

gggggatttg 

acaagggtct 
ctgggtactt 
ctgtttaaca 
acatgtttca 
gaagaatttt 
gacacagtaa 
aaaactgeca 
gggtggcggc 
cccccacctc 
tggggegget 
gggeggggge 
ccccacctcc 
ggctgecagg 
cctcacttct 
ageegggcag 
cagttcccag 
cgagcagagg 
tcccagacag 
aggcagctgg 
gtaacattga 
aggegggcag 
caccaaaaaa 
gcaggctgag 
agtccagcct 
gagggagacc 
attgaagcat 
agaaacagct 
ctttctctcc 
cctgcagcag 
caagttcaga 
catctgcctg 
tattttcaaa 
tattttagga 
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aaaaacaacc tttcaaagtt agtaaggcta ttggtggtgt taaaatatca cccggttaaa 128160 
aaacaaacaa actgatagca gaaaaaaatg gcaaaatatg aactggtgct atataaaaac 128220 
atatttaaag aaccaagtaa cttatgactt tgtgctcaac ctgaataata attagggaag 128280 
tgtacattac aagatatctg tatctttata tatctgtacc tatatatctg tgtctgtatg 128340 
ttgcagaatg acagaaaaac tgatattgtt aagcattgga gagactgtgg aacaactgac 128400 
cctctcatat aatgctagtg ggaatgcaaa tggatataaa cagtatggaa aactactagg 1284 60 
cagcataaat ttaagctgca catatttata ctccataaac caacactact gcccaaagta 128520 
ctaacaggca gaaatacatg aatatgttcg tgaagagata tgcacaagat tgaattttgt 128580 
aacagccaca cctggaagta tcctatatgt gtatcaatag taggaaggat ttattaattt 12 864 0 
tggtgtactt actatacagc cgtggagatg aaggaattat ttacatattc atgacactcc 128700 
tcattttgaa taaaaatcac caacataata tttagcacca atgtagtgtt cattggaagg 128760 
ctagacctaa cagaatatgc actgtataat ttctattatg gaaaactcaa aagcagacac 128820 
aattaatgta cagtgttaaa agtgaagatg tagttaattt gaggtfcagtg actaggaaag 128880 
ggcaagcaga gagatagtgc agtactgtta atgttctact tgaggattta gaggcaagta 128940 
atacaagtat gttcactttg taaaaattca tgaagctgta cgcctacatt ttgtgcacac 129000 
tttctgtgtg taaattagac ttcaatataa aggttactaa aaacgaataa aaatagtact 129060 
agacttcaag caagtaaagc ttcattccaa tatcaaagca ttctatttac ccatcagtac 12 9120 
acagagggta ttagtttgct agggctgcca caaataagta ccatgaactt ggtgacttaa 129180 
acatgcagat ttatttcctc acagttctag aggctagaag tccaagatca aggtgtgggc 129240 
aaaactggtt tcattctgag ttctttttct atcttgtgga tgatcatctt atcccgacct 129300 
ctttacactt tctttttctg tgtgtatctg tattctaatc tcttcttata aggatgcaag 129360 
tcatattgga ttagggcaca gctcacccgc tagaccttat cttacttaaa tgttctcttt 129420 
cgatatcgtg tctccaacag tcacgctctg tggggcttgg agctggaact tcagcataag 129480 
aattttggag agtgagggaa gaggcacaat ccagtccata acacagatta agaaatgtaa 129540 
aatgctaata gaattttgac acaaagtttg tgacactggt aggagagaaa ctgtatcaga 129600 
aaagttgaat taagttgaaa gtaacatggt aaacctaagg caatgtgaga atccatggca 129660 
gtcaggaatg ttattgtatg gattttctaa tgtaagaagg aaaatgctga gactgaaaca 129720 
taaggcagag aaggaccaga gagttgtgag ttcccatttt aaatttgtgt tgtgccaaat 129780 
gtcatctctc tagagaaatt attcagtgag aaaaaaaatc taacagagta attgcttcat 129840 
ttttgcatat ctgtgaaatc ccttagggaa ataaagtcat catacaaata ttataaatta 129900 
ttcctgtatt tgtcaccaga aaagccattg atattctttg taaggacagc tcttccctta 129960 
ttcataggta agtttctgca tgtgttttta atcctggaac tctacttgct atacaatcgt 130020 
atgtattttc agagttagat atatgattgt gatgattaaa tgactaggta gaaagaaaaa 130080 
tgccaattac cagaaaaatg tagacagtta gtatttaaga tacttttatt tgttaaagtt 130140 
ttgattaatg aggatggaag ttaatggcat aaaaatataa gaggcatgct ctaggatctt 130200 
tcactcaata taaatgaaag ctaatattta ttaagggttt accacacatt gggcacagtg 130260 
ctacgcatat tacatactcc attttgtgaa atcctaaaaa tagcactttt gtgtttgtta 130320 
atttcatcag taatagaaaa aaactatagc ccagagttat taagaaatat gacccagact 130380 
actcagatca gaagttctga catcacagtg tgatctcaac cagttgactc caaagcacat 130440 
gtttctacca gtacagtatg ctttatggtt cgtagtggaa tttccttctg tactaaccat 130500 
gagggaaata tgctattatc catacctatt acaggcagag tgtcatagaa tcggtttgag 130560 
ggttgaatgt gttaaaactt ataaaataga ttagcgtttg gattataaga aacaccatgt 130620 
aagtgctggt taaattagtt gtaaaactaa aacacagaat aaggaacatg tcaaaagaat 130680 
agagcagcat ttcagaaata tctaactcca gatcctgtga actgatttta tgctaagcct 130740 
attgcatttt tatcaaagca ttcatctttt tgtttggttg agtcctaaaa cttaagaatg 130800 
tcaaccagat gtgtgcattt gtcaaacaca caaagttggg cactttaaat atgtgaattt 130860 
cactgtatat aaattattgc aaaaacaaca ttaaacaaag aaacaaaggt gaaatctgac 130920 
aggagcttga tatataaaaa tgaatgaggg tacatcacta ctatgattga atagatgtag 130980 
acacagcttt tactcaatgt tgtataaatc acaaataaaa tctttgtttc agatattcaa 131040 
aaataacctt ctataagttg ttcttgtgaa tatagatgat ttttatatag aaaaatagaa 131100 
ggtacatttc attaatatca gtgttaacag tttttgttgt tgttgtaagt aacagaacac 131160 
agctctggtt atgctaagca aaaccggaac attctgaaag gatgttaggg ctcctaaatc 131220 
aatgtgcggt taaaagacca ggctcagaga acagatagca gcctaggcag gtgtggaggc 131280 
tgagcagatc aaaccacaca gaatcagtgt ttctttgcaa catcagcctg atcttcaaac 131340 
ttcactgtaa tcaattagat gaaaaaatat tttgcaacct tgagtcatga acttacttaa 131400 
gacaagcagg gactaacatc tgacccctgc ctactccttt atggtttttt tttttttttt 131460 
tttttttttt tttttttttt tttttttttg agacagcatc tggctctgtc gtccaggctg 131520 
gagtgcagtg gcaccgtctt ggctcactgc aagctccgcc tcccgggttc acgccattct 131580 
cctgcctcag cctcctgagt agctgggact acaggcgcct gccaccacac ctgccctgac 131640 
tactcctttc taaacagaga atctggtctc tcagtggaaa ctagaagtgg aagatcaaga 131700 
ggagagaatt tccatgagtc ttaactccac ataatggaaa atttcctcca aaaaattcaa 131760 
agagagtgtt actggctaga gaatttaata gctgtgcaac cctcaaaagg caaatatgca 131820 
gacacttttc aaaataaggg gaaaaaaatc atgccctgcc tggagaatcc ctattgcact 131880 
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aagtgataac aagatggtgt cttttaaggt ggctgtttca agctgctgaa atcctgctct 135720 
tttatggaca cagagtcctc tagtaagaac tgatagtgga agagtgactt tgttatgtcc 135780 
ttatctggtt ggatgcagtc tttcttgatt aggcaaaaca tctggccctt gttggcatga 135840 
tgcctttaaa aatgtaagat gaagtcattt tctaagatgg agtaacttat atcaatgggg 135900 
ctctatacta cgattcagcc ccaggttccc ttctacactt tcctttaccc ttaccatttc 135960 
agccttagct gaagagctgg gtgcagtggc tcatgtttgt aatcccagca ctttgggagg 136020 
ctgaggtggg tgaatcacca gaggtcagga gttcgccact agcctggcca acatggtaaa 13 6080 
accctgtttc tactaaaaat acaaaaatta gctgagcatg ctggtggaca cctgtaatcc 13614 0 
cagctatttg ggaggctgag gcaggagcat tgcttggacc caggagacag aggttgcagt 136200 
gagccgagac cgcactgttg ccctccagcc tgggcaacaa gagtgaaatt ctatctcaag 136260 
aaaatagact tagctgaaga attatctctt cctgcaaggt ttctttaaca cattcaggat 136320 
gtgtagtgtc aactgaggaa tgatgagatt cataaattta gaaaggtgga ctttctcata 136380 
aagggttgta gcctgtaggg tgaccgttct gacaggctgt gaagcatatc ctccagctag 136440 
aagttggaaa gagacacttc gacagtatga agagtaagac agggatttat gctgaatggg 136500 
atgaccaaat atattacaca tatatgataa tacatattca acaggctata gaaaaaacta 136560 
tgaatattca caaagaagag gcacaggcat gaatagtagg ctaatataag caacatgcat 136620 
cccatgttcc ctttggagtg gggacttaac atttaaatgt gtcatgatta gactctatgc 136680 
accaaaaggt gaatcagaag acaccaagac cctctgtgca cagcctctgt tgactggcaa 136740 
gagccactag gttattggtg gtctcttatc aagaaggaat gctggtcaat tgctgtgttg 136800 
aaaccgcaaa aagagaagtc cagtgtcagg tggtttgcag atatcagtgg tggtgcgagt 136860 
ctcacaaggg caggtttctg tttaaccctt attgtaggaa gcctaatggt gtttagcaag 136920 
ggaggggaga taacgaggca tgtctgatct cccatctgtc atggcaggaa ctcagatttt 136980 
aaagtttttc tgaggtttcc ttgaccaaga ggcagtctgt tcaattgatt caggggttag 137040 
gattttattt gtatttctca ttaggtacca catgatgatt cacccacaga ttcacaatta 137100 
tattattgca ttcataacat ggttctataa ttatatgcat gtaaatctgt tccttccact 137160 
attttagcaa gttctcaaag gaaaggacca catctttttg tttttatatt tttaccacct 137220 
taagatagtg ttctataaag ggaggatgcc catttttttt ttgaaactgt gagaacaatc 137280 
ccttccactt tctacctttg tctgtaatta tggccaatta cagatttctc tccatgatac 137340 
tcgcttctcc catcctaaca tatattcaag gcagaacaat agatcattta gtttaagaaa 137400 
accatgttca agttctttat cataaatggc ccactgaaag cccagcaacg tgaatcataa 137460 
ctgagcaaga attggagaaa gtaatttcat tggcagcaga caggaaagat cacatactac 137520 
atcctattct tcatagcaga gagacagata acaataaatg ctgaactaca gtaaaagatg 137580. 
ttaagggaaa tatttgtgag gaataaatct ttgtagcaat gtattttcct tgatatgcaa 137640 
tcataattat ccatagcatt tgggaaacaa ctgacaattt ttatcacctt tataaatgtt 137700 
agttttgatc tttatagcaa ggttataatg gaaaaatcaa ccactgtgta ataaaattat 137760 
tttaaaatga acagaattac actaggctgt ctgggacaga ggcaagggaa gggctgagtc 137820 
atgatttaag gtgcaggaaa caaaagggac ctcattgttg ctcagacaga aaagaggttt 137880 
ggtgggaatc agacaacagg tatatattga gacaacgaaa tatccaatcc ttgaaaaagt 13794 0 
acattcttgt gcatcacttt ttcatagcca accgtcctaa gatttatgcc atgtgataag 13 8000 
ctgatgataa aacattctct tcaagttgaa acagaataca gttgtggaaa aatatttggc 138060 
tttgtaatta ttcagaactg agcctgaatc ctagttttat tactattttg ttagctgggt 138120 
aaacttagag aagttccctc tccatatcag tttattcaaa tgcaaaaccc catttcatgg 138180 
agttattgtg aatatcaaat attattttta atatacattt tcctcaattg cattttgagg 138240 
caagtatgct gagttccagt gtggactcaa tttcatgaaa gttttttaac atgggaaaca 138300 
tgatcaaaac aataagttaa atatgttagt tattcattta tttaacatat gattattgtc 138360 
cctgtcagtc actttcatca ttggcagtca ccagtctctt ctacctgttt cattgtttct 13 8420 
tcatcagtct cttcctgttg tattgtttca tctatctgtt tctaaaacat ttcatgtttt 138480 
ttccagaaat tatttgtacc tagtattatt tgtttacata tcttaggcat cccatcaaaa 138540 
tgcaaatccc tttgttgaag aaatccttta aaaatatttt tattattcag attaaatagt 138600 
attgaagttt gtacttagta catattcttg tgtgaacttg ataaaggaca aacaatggag 138660 
gaaatatggt agtgctcttg agatgggaag gacaattacc tgaattcagg tttcatgaga 138720 
agttaagtgt cctcagacta ttttaaattt gtttttcagt attgtataaa gtgctaccct 138780 
cgttggataa caggtgattg ctttgagctc tactgaagta taaaatatag atttttttct 13 8840 
aacatcttga ttcaagataa aatagtacaa ccaattagta tttcccttga gaatgttctc 138900 
cactgtgatt tcacttctaa ttacctcttc ttacacattt ggggattctc ctgagatata 138960 
tgctgatgcc agtgatgtgg gtatatatct tccagcactg gccagaattt gacctttggt 139020 
gtcatgaaaa agacaccctt attttcacag gtaagaaagg aaatgttttt gcatattatt 139080 
tttcagtatc taataatgtc tgtattcact ttgttaaatt gtgtacattt tagacagcaa 139140 
tgtgtttaaa taggtagttt tgaggcaaat ttgcacagac aatgcatttt ccatgaaaga 139200 
aaatctgtag ggagcttact tagcttcatc tttcactttt ttgatattac cagtgttatt 13 9260 
gaaacatcct ggtggaattt catagtatgg ctatttcagc agatgttttc actgttatta 139320 
gtttcaagta atattatttt tgtgttttcc atttgcctaa tgtcttggtg cctactgtca 139380 
ccaatcgcca caagaaaaag gcaagttgaa aataatttaa gtctacatag cttaattaat 139440 
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cacataaaat 

ccattaaaca 

tgagtagatc 

atgccctatt 

actttagttt 

cctgattata 

taacttgtgt 

ctaggctgca 

aagttattga 

acagatttct 

aatgctagaa 

gaccagtgaa 

aagaccctgg 

ccttcttgcc 

tcatattctt 

catctcttat 

ctcaaagtcc 

agataaattg 

aaccaaagga 

tctgagaatt 

gaccagaatg 

attgtccact 

taattatctg 

gtaaattagt 

aagatctcat 

aaagtgctac 

tgacaagttc 

gatgtaaatt 

gagacagatt 

ttfcgtgataa 

gaatacacat 

atgcacaatt 

tactaaaata 

tcacctatat 

tatttttttc 

cttgatatct 

tttatacttc 

attggaattt 

atccttccct 

ttctttcctt 

ctttttcttt 

ttctttgttt 

atgatctcag 

tcctgagtag 

agtagagaca 

tcaacccgtc 

catttttaag 

ctgggtatat 

cttagggtga 

ataatggagg 

catgtgagcc 

gcaagaaatt 

tatgaaattt 

caacagacag 

atatcaatgt 

tataaacaaa 

atataaaaat 

tcaccgaatg 

ggattcagtc 

tctacctctc 

ttcaaaatct 

atttcccaca 

gcacactgtg 



ttttctgacg 

tgtgtgaaag 

ttaagttctc 

tttgtctaga 

tcagaatatt 

atcaaatcag 

caaggacctt 

caagtagtca 

ctgcatttga 

catgggctct 

ttaaaatatg 

acctgagcaa 

attgaagaag 

aattaagtta 

tacagtttcg 

ctttgttaga 

tcatttattc 

aattttaaca 

agagatagac 

ataatatatg 

tcaaattttc 

gctctatttc 

ctataagcac 

gtttttttaa 

acaatttaat 

tcgtattgac 

tccactaatg 

tcattttacg 

aacacatatg 

attcagtaaa 

taaaccactc 

taataatttt 

gtccctcttt 

acatgtagct 

caatttcatg 

aataagtaac 

cgtataaata 

atttgactct 

tagtgaacaa 

tcctttcctt 

ctttctttct 

ctgagacgaa 

tgcactacaa 

ctggggttac 

gggtttcacc 

ttggcctccc 

gctttttaat 

tagcagggag 

ttgagttaat 

aataggaaat 

aagcagggga 

aaaatatagg 

catatagaat 

acaggctaaa 

ttaagactca 

gaaccaatat 

acaccgagac 

gggatgtatg 

tctcagtctt 

catgtagctt 

ctattacaat 

gcataaggat 

attacaaatt 



tctaagtaaa 

tggtccatta 

ttgggttctc 

gacctagttt 

attttacttt 

tctgattata 

aaacagtaag 

cagaatcatt 

atgtttgttt 

atatctgtat 

gttaaygata 

atttgcaaag 

ttgctcacca 

caattcatag 

gcactgatac 

gtgcttgtca 

atatttgttt 

ttttggaaac 

ctttctggga 

atttgtcttc 

tgagcctggt 

tagcatccag 

gcatgcatac 

ctataaaatt 

agaaaatatg 

ttcctgagac 

acatttgctc 

caattgttag 

atattttaac 

tgcataatgt 

cctagaatta 

cccatttaaa 

ccaaccctga 

gttcctggac 

tcaataacac 

tctgcttatt 

ttagaatcag 

atgggtcact 

agaatctctc 

tcctttcttt 

ttcttccttc 

ctctcgcctc 

cctctgcctc 

aggtgcacac 

atgttggcca 

aaaatgctgg 

gttgtgtggc 

taagtataga 

tccttttatc 

tctttgatat 

ctataatgtt 

aacaaattaa 

tgagatattg 

ttgggatgtg 

aacgaaaggc 

acatatcaca 

agtaagttga 

agataagcat 

cagaatgaag 

tgtatgtata 

ttaatcttta 

actttcgttt 

cagcctcagt 



45 

ataaatgctc 

gcttcagtat 

ttattttata 

ttttacaagc 

tttcaacaaa 

atcaaatcag 

ttgatgaatg 

ctcgggtcat 

tgagtgtttc 

attagctgtc 

atggtctcca 

ctccttagag 

ccagagagtt 

gcattattct 

ctcggctggc 

ctatcatctg 

cctgggacag 

aatttctgtc 

aagtaaaata 

tgattcttct 

tactcaccta 

caccatgtct 

acaaatgcat 

tcactttaat 

tgtctgatat 

tgctcagaag 

atgtctgaat 

tatactgaaa 

ttaatttgat 

ctatataaat 

taagtaattt 

taatctataa 

tctgaaatgt 

tttctattct 

actgtcctga 

gccctgagag 

ctattccaca 

taggaaagac 

tccattttta 

cctttccttt 

cttccttcct 

tctttgccac 

ctgtgttcaa 

ctccatacct 

ggctggtctc 

gattataggc 

aatggcatca 

ttgagaacct 

ctaatggcta 

aggacttctt 

ggttctatgc 

aacatagatt 

atgaaggagc 

caaattggaa 

tttatttaca 

atgaaaagca 

actcggtgag 

ctcagtggga 

agctttgcct 

tgtaagcatg 

acatataagt 

ataattttgc 

ggaaaccact 



aaacaggctt 

gtggagctaa 

agattctggg 

aatacttgtg 

atacttgaaa 

tctgattcta 

aaattctcat 

tatcaaattc 

ttgcctctca 

tctgattgcc 

ctaaaagtgc 

actaagtcag 

aaaagaagcc 

agattcttgg 

tcgctttctt 

taactcaaag 

gataaggagg 

attacccaac 

ttacagaagg 

gcttatgagt 

actaaattta 

ggaacttgaa 

ggctgacaac 

aagagtttag 

tcttagatta 

agataagttt 

tgccccatgg 

ttgtttttaa 

attttatata 

tgtaagctac 

ctgtgcattg 

tccattaagc 

catctttgtt 

ggaacattag 

atactggttt 

atattgagtc 

agaattgtgt 

tgaccttttt 

gttctttctt 

cctttctttc 

tccttccttc 

ccaggctgga 

gtgattctcc 

ggctgatttg 

gaactcatga 

ataagccacc 

aagtgatcac 

gaaaacattt 

aactacacat 

tatacagaga 

agcatgaatg 

gatacctcaa 

ataaatacac 

aaaatgctta 

taagttccca 

ctactactaa 

atatatagta 

ttagtcactt 

gcattatcct 

tcattagaag 

tatctagatt 

attaagcctc 

tgttgcaaat 



tttattgtga 

cgaacttggg 

ttcttttcca 

cctcagtgag 

cataaatcag 

atcaaataac 

acctattttt 

atcatcatca 

gcagcagtgg 

attgtcccgc 

atgtccagat 

gactaaaacc 

cttgtgcctt 

cccagatttt 

cttcctcatc 

acaactctaa 

caataatatt 

cagtcaatgg 

agagcaccgt 

tgtaagctgg 

tttctactat 

aaatgctcaa 

atattacaaa 

caattgttgt 

ggtgaaaaat 

tatcagcttc 

aaaaaaatta 

ccaaaatttg 

tttaarataa 

aaaatataat 

tttaaagtaa 

caatagtaaa 

atatgccaca 

ttaaagtgta 

tacagcaaat 

ttacggttat 

ttgaattttg 

atgctattga 

tctctctctt 

tttcttcttt 

cttccttcct 

gtgcagtggc 

tgcctcagtc 

ttttattttt 

ccagaagtga 

acatccagcc 

ataatttatt 

ttttaatcat 

cagggttaac 

tataaaagag 

ttgtaaaata 

agataataca 

ttgtttaacc 

taaacacgtg 

aaaactaact 

aaacaacaaa 

tattagcaga 

catctctcta 

caaagtcctt 

ttctaagcat 

tttaaaagaa 

caacttaatt 

gtttagactt 



139500 

139560 

139620 

139680 

139740 

139800 

139860 

139920 

139980 

140040 

140100 

140160 

140220 

140280 

140340 

140400 

140460 

140520 

140580 

140640 

140700 

140760 

140820 

140880 

140940 

141000 

141060 

141120 

141180 

141240 

141300 

141360 

141420 

141480 

14154 0 

141600 

141660 

141720 

141780 

141840 

141900 

141960 

142020 

142080 

142140 

142200 

142260 

142320 

142380 

142440 

142500 

142560 

142620 

142680 

142740 

142800 

142860 

142920 

142980 

143040 

143100 

143160 

143220 
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gcttcatagt ataactgttc atcttcagtt acagaactgc tactgagata acataactaa 143280 
agccttttgg ctctttttat acaaagcatg atatttaact agggttttag tgatttttaa 143340 
aaagtttctc tttctcctta gatattcaga ccaatgcgtc tcatatgaga tgaagaaatg 143400 
tccagaagaa actatactga actgacagaa tttgttctct tgggtctaac aagccgtcca 143460 
gagctgcgag ttgctttctt ggcactgttc ctttttgtct acatagccac tgtggtagga 143520 
aacttgggga tgattatttt aatcaaagtt gattctcgac ttcacactcc catgtaattt 143580 
tttctctcca gtttgtccat tctagatctg tgtttctcca caaatttcac tcccaaaatg 143640 
ctagaaaatt tcttatcaga gaagaagacc atttcctatg caggttgttt gatgcagtgc 14 3 700 
tatgttgtca ttgctgtggt ccttgcagag cactgcatgt tggcagtcat ggcatatgac 143760 
cgctatatgg ccatctgtaa tccattgctc tacagtagca aaatgtccca aggtgtttgt 143820 
gtccacctgg tcattgtccc ttatgtctat ggctttcttc tcagtgtgat ggaaacctta 143880 
aggacctaca acctctcctt ctgtggaaca aatgaaatca accatttcta ctgtgctgat 143940 
cctcctctta tcaaactggc atgctctgac acgtacagca aggagctgtc catgtacata 144000 
gtagccggct acagcaacgt ccagtctctt ctratcattc tcacatccta catgttcatc 144 060 
cttgtcgcta tcctcagaag ccattctgca gagggaagga aaaaagcttt ttccacatgt 144120 
ggttcccacc tgacagttgt cacaatcttc tatggaaccc tcttctgcat gcatttgaga 144180 
cgtcccacag acgagtccgt ggagcagggg aaaatggtgg ctgtgtttta caccacagtg 144240 
atactcatgc tgaactccat gatctatggc ctcaggaaca aggatgtgaa agaggcgttg 144300 
aaaaaagcaa taggaaaaca aacattggga aaataaaaat gctaagctat cattaaaaat 144360 
ttgtgaagta atgagatata atatcattgg gttagatgtc acattttagg ctacatttgc 144420 
acaattcatt tctaattttc tgttaggtag ctgactgagt 144460 



<210> 2 

<211> 195 

<212> DNA 

<213> Homo sapiens 



<400> 2 

atg age ttc tta ata aga 
Met Ser Phe Leu He Arg 
1 5 
ttg ttc etc agt cat etc 
Leu Phe Leu Ser His Leu 
20 

gec act cct ccg atg ctg 
Ala Thr Pro Pro Met Leu 
35 

ttt cct tta ttg gtt get 
Phe Pro Leu Leu Val Ala 
50 

tga 
* 

65 



agt gat tea aca eta cac 
Ser Asp Ser Thr Leu His 
10 

tec ttt gta gat etc tat 
Ser Phe Val Asp Leu Tyr 
25 

gtt aac ttt ttt ttt cca 
Val Asn Phe Phe Phe Pro 
40 

tta tec aat ttc acc ttt 
Leu Ser Asn Phe Thr Phe 
55 60 



act cca atg tgc 48 
Thr Pro Met Cys 
15 

tat gec acc aat 96 
Tyr Ala Thr Asn 
30 

aga gaa aaa ccg 144 

Arg Glu Lys Pro 

45 

tea ttg cac tgg 192 
Ser Leu His Trp 

195 



<210> 3 
<211> 948 
<212> DNA 

<213> Homo sapiens 
<400> 3 

atg ttc tec cca aac cac 
Met Phe Ser Pro Asn His 
1 5 
ctg aca gac gac cca gtg 
Leu Thr Asp Asp Pro Val 
20 

gcg ate tac eta ate aca 
Ala He Tyr Leu He Thr 
35 

ate agg acc aat tec cac 
He Arg Thr Asn Ser His 
50 

cac etc tec ttt gta gac 



acc ata gtg aca gaa ttc 
Thr He Val Thr Glu Phe 
10 

eta gag aag ate ctg ttt 
Leu Glu Lys He Leu Phe 
25 

ctg gca ggc aac ctg tgc 
Leu Ala Gly Asn Leu Cys 
40 

ctg caa aca ccc atg tat 
Leu Gin Thr Pro Met Tyr 
55 60 
att tgc tat tct tec aat 



att etc ttg gga 48 
He Leu Leu Gly 
15 

999 9 ta ttc ctt 96 
Gly Val Phe Leu 
30 

atg ate ctg ctg 144 

Met He Leu Leu 

45 

ttc ttc ctt ggc 192 
Phe Phe Leu Gly 

gtt act cca aat 240 
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His Leu Ser Phe Val Asp lie Cys Tyr Ser Ser Asn Val Thr Pro Asn 

65 70 75 80 

atg ctg cac aat ttc etc tea gaa cag aag acc ate tec tac get gga 288 

Met Leu His Asn Phe Leu Ser Glu Gin Lys Thr lie Ser Tyr Ala Gly 

85 90 95 

tgc ttc aca cag tgt ctt etc ttc ate gec eta gtg ate act gag ttt 336 
Cys Phe Thr Gin Cys Leu Leu Phe lie Ala Leu Val lie Thr Glu Phe 

100 105 110 

tac ttc ctt get tea atg gca ttg gat cgc tat gta gee att tgc age 3 84 

Tyr Phe Leu Ala Ser Met Ala Leu Asp Arg Tyr Val Ala lie Cys Ser 

115 120 125 

cct tta cat tac agt tec agg atg tec aag aac att tgc ate tct ctg 432 
Pro Leu His Tyr Ser Ser Arg Met Ser Lys Asn lie Cys He Ser Leu 

130 135 140 

gtc act gtg cct tac atg tat ggc ttc ctt aat ggg etc tct cag aca 4 80 

Val Thr Val Pro Tyr Met Tyr Gly Phe Leu Asn Gly Leu Ser Gin Thr 
145 150 155 160 

ctg ctg acc ttt cac tta tec ttc tgt ggc tec ctt gaa ate aat cat 528 
Leu Leu Thr Phe His Leu Ser Phe Cys Gly Ser Leu Glu He Asn His 

165 170 175 

ttc tac tgc get gat cct cct ctt ate atg ctg gee tgc tct gac acc 576 
Phe Tyr Cys Ala Asp Pro Pro Leu He Met Leu Ala Cys Ser Asp Thr 

180 185 190 

cgt gtc aaa aag atg gca atg ttt gta gtt gca ggc ttt act etc tea 624 
Arg Val Lys Lys Met Ala Met Phe Val Val Ala Gly Phe Thr Leu Ser 

195 200 205 

age tct etc ttc ate att ctt ctg tec tat ctt ttc att ttt gca gcg 672 
Ser Ser Leu Phe He He Leu Leu Ser Tyr Leu Phe He Phe Ala Ala 

210 215 220 

ate ttc agg ate cgt tct get gaa ggc agg cac aaa gec ttt tct acg 720 
He Phe Arg He Arg Ser Ala Glu Gly Arg His Lys Ala Phe Ser Thr 
225 230 235 240 

tgt get tec cac ctg aca ata gtc act ttg ttt tat gga acc etc ttc 768 
Cys Ala Ser His Leu Thr He Val Thr Leu Phe Tyr Gly Thr Leu Phe 

245 250 255 

tgc atg tac gta agg cct cca tea gag aag tct gta gag gag tec aaa 816 
Cys Met Tyr Val Arg Pro Pro Ser Glu Lys Ser Val Glu Glu Ser Lys 

260 265 270 

ata act gca gtc ttt tat act ttt ttg acc cca atg ctg aac cca ttg 864 
He Thr Ala Val Phe Tyr Thr Phe Leu Thr Pro Met Leu Asn Pro Leu 

275 280 285 

ate tat age eta egg aac aca gat gta ate ctt gec atg caa caa atg 912 
He Tyr Ser Leu Arg Asn Thr Asp Val He Leu Ala Met Gin Gin Met 

290 295 300 

att agg gga aaa tec ttt cat aaa att gca gtt tag 94 8 

He Arg Gly Lys Ser Phe His Lys He Ala Val * 
305 310 315 

<210> 4 
<211> 519 
<212> DWA 

<213> Homo sapiens 



<400> 4 

atg tta aag aaa aac cat aca gee gtg act gag ttt gtt etc ctg gga 4 8 

Met Leu Lys Lys Asn His Thr Ala Val Thr Glu Phe Val Leu Leu Gly 

1 5 10 15 

ctg aca gat egg get gag ctg cag tec ctt ctt ttt gtg gta ttt eta 96 

Leu Thr Asp Arg Ala Glu Leu Gin Ser Leu Leu Phe Val Val Phe Leu 

20 25 30 

gtc ate tac ctt ate aca gta ate ggc aat gtg age atg ate ttg tta 144 
Val He Tyr Leu lie Thr Val He Gly Asn Val Ser Met He Leu Leu 
35 40 45 
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ate 


aga 


agt 


gac 


teg 


aca 


eta 


cac 


act 


cca 


atg 


tac 


ttc 


ttc 


etc 


agt 


192 


lie 


Arg 


Ser 


Asp 


Ser 


Thr 


Leu 


His 


Thr 


Pro 


Met 


Tyr 


Phe 


Phe 


Leu 


Ser 






50 








55 










60 












cac 


etc 


tec 


ttt 


gta 


gat 


etc 


tgt 


tat 


acc 


acc 


aat 


gtt 


act 


cct 


cag 


240 


His 


Leu 


Ser 


Phe 


Val 


Asp 


Leu 


Cys 


Tyr 


Thr 


Thr 


Asn 


Val 


Thr 


Pro 


Gin 




65 










70 










75 










80 




atg 


ctg 


gtt 


aac 


ttt 


tta 


tec 


aag 


aga 


aaa 


acc 


att 


tec 


ttc 


ate 


ggc 


288 


Met 


Leu 


Val 


Asn 


Phe 
85 


Leu 


Ser 


Lys 


Arg 


Lys 
90 


Thr 


lie 


Ser 


Phe 


lie 
95 


Gly 




tgc 


ttt 


ate 


caa 


ttt 


cac 


ttt 


ttc 


att 


gca 


ctg 


gtg 


att 


aca 


gat 


tat 


336 


Cys 


Phe 


lie 


Gin 


Phe 


His 


Phe 


Phe 


lie 


Ala 


Leu 


Val 


lie 


Thr 


Asp 


Tyr 








100 










105 










110 








tat 


atg 


etc 


aca 


gtg 


atg 


get 


tat 


gac 


cgc 


tac 


atg 


gee 


ate 


tgc 


aag 


384 


Tyr 


Met 


Leu 


Thr 


Val 


Met 


Ala 


Tyr 


Asp 


Arg 


Tyr 


Met 


Ala 


lie 


Cys 


Lys 






115 










120 










125 










ccc 


ttg 


tta 


tat 


gga 


age 


aaa 


atg 


acc 


agg 


tgt 


gtc 


tgc 


etc 


tgt 


ctg 


432 


Pro 


Leu 
130 


Leu 


Tyr 


Gly 


Ser 


Lys 
135 


Met 


Thr 


Arg 


Cys 


Val 
140 


Cys 


Leu 


Cys 


Leu 




get 


get 


get 


ccc 


tat 


att 


tat 


ggc 


ttt 


gca 


aat 


ggt 


eta 


age 


aca 


gac 


480 


Ala 


Ala 


Ala 


Pro 


Tyr 


lie 


Tyr 


Gly 


Phe 


Ala 


Asn 


Gly 


Leu 


Ser 


Thr 


Asp 




145 










150 










155 










160 




cac 


cct 


gat 


get 


teg 


tct 


gtc 


ctt 


ctg 


tgg 


acc 


caa 


tga 








519 


His 


Pro 


Asp 


Ala 


Ser 


Ser 


Val 


Leu 


Leu 


Trp 


Thr 


Gin 


* 
















165 










170 

















<210> 5 

<211> 948 

<212> DNA 

<213> Homo sapiens 



<400> 5 

atg ttg tec cca aac cac 
Met Leu Ser Pro Asn His 
1 5 
ctg aca gac gac cca gtg 
Leu Thr Asp Asp Pro Val 
20 

gcg ate tac eta ate aca 
Ala lie Tyr Leu He Thr 
35 

ate agg acc aat tec caa 
He Arg Thr Asn Ser Gin 
50 

cac etc tec ttt tta gac 
His Leu Ser Phe Leu Asp 
65 70 
atg ctg cac aat ttc etc 
Met Leu His Asn Phe Leu 
85 

tgc ttc aca cag tgt ctt 
Cys Phe Thr Gin Cys Leu 
100 

tac ttc ctt get tea atg 
Tyr Phe Leu Ala Ser Met 
115 

cct tta cat tac agt tec 
Pro Leu His Tyr Ser Ser 
130 

gtc act gtg cct tac atg 
Val Thr Val Pro Tyr Met 
145 150 
ctg ctg acc ttt cac tta 
Leu Leu Thr Phe His Leu 



acc ata gtg aca gaa ttc 
Thr He Val Thr Glu Phe 
10 

eta gag aag ate ctg ttt 
Leu Glu Lys He Leu Phe 
25 

ctg gca ggc aac ctg tgc 
Leu Ala Gly Asn Leu Cys 
40 

ctg caa aca ccc atg tat 
Leu Gin Thr Pro Met Tyr 
55 60 
att tgc tat tct tec aat 
He Cys Tyr Ser Ser Asn 
75 

tea gaa cag aag acc ate 
Ser Glu Gin Lys Thr He 
90 

etc ttc ate gee eta gtg 
Leu Phe He Ala Leu Val 
105 

gca ttg gat cgc tat gta 
Ala Leu Asp Arg Tyr Val 
120 

agg atg tec aag aac att 
Arg Met Ser Lys Asn He 
135 140 
tat ggc ttc ctt aat ggg 
Tyr Gly Phe Leu Asn Gly 
155 

tec ttc tgt ggc tec ctt 
Ser Phe Cys Gly Ser Leu 



att etc tta gga 4 8 

lie Leu Leu Gly 
15 

ggg gtg ttc ctg 96 

Gly Val Phe Leu 
30 

atg ate ctg ctg 144 

Met He Leu Leu 

45 

ttc ttc ctt ggt 192 
Phe Phe Leu Gly 

gtt act cca aat 240 
Val Thr Pro Asn 
80 

tec tac get gga 2 88 

Ser Tyr Ala Gly 
95 

ate act gag ttt 336 
He Thr Glu Phe 
110 

gee att tgc age 384 

Ala He Cys Ser 

125 

tgc ate tct ctg 432 
Cys He Ser Leu 

etc tct cag aca 480 
Leu Ser Gin Thr 
160 

gaa ate aat cat 528 
Glu He Asn His 
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ttc 


tac 


tgc 


get 


gat 


cct 


Phe 


Tyr 


Cys 


Ala 


Asp 


Pro 








180 






cgt 


gtc 


aaa 


aag 


atg 


gca 


Arg 


Val 


Lys 


Lys 


Met 


Ala 






195 








age 


tct 


etc 


ttc 


ate 


att 


Ser 


Ser 


Leu 


Phe 


He 


He 




210 










ate 


ttc 


agg 


ate 


cgt 


tct 


lie 


Phe 


Arg 


He 


Arg 


Ser 


225 










230 


tgt 


get 


tec 


cac 


ctg 


aca 


Cys 


Ala 


Ser 


His 


Leu 


Thr 










245 




tgc 


atg 


tac 


gta 


agg 


cct 


Cys 


Met 


Tyr 


Val 


Arg 


Pro 








260 






ata 


att 


gca 


gtc 


ttt 


tat 


lie 


He 


Ala 


Val 


Phe 


Tyr 






275 








ate 


tat 


age 


eta 


egg 


aac 


lie 


Tyr 


Ser 


Leu 


Arg 


Asn 




290 










att 


agg 


gga 


aaa 


tec 


ttt 


lie 


Arg 


Gly 


Lys 


Ser 


Phe 


305 










310 



49 

170 

cct ctt ate atg ctg gee 
Pro Leu He Met Leu Ala 
185 

atg ttt gta gtt gca ggc 
Met Phe Val Val Ala Gly 
200 

ctt ctg tec tat ctt ttc 
Leu Leu Ser Tyr Leu Phe 
215 220 
get gaa ggc agg cac aaa 
Ala Glu Gly Arg His Lys 
235 

ata gtc act ttg ttt tat 
He Val Thr Leu Phe Tyr 
250 

cca tea gag aag tct gta 
Pro Ser Glu Lys Ser Val 
265 

act ttt ttg age cca atg 
Thr Phe Leu Ser Pro Met 
280 

aga gat gta ate ctt gee 
Arg Asp Val He Leu Ala 
295 300 
tgt aaa att gca gtt tag 
Cys Lys He Ala Val * 
315 
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175 

tgc tct gac acc 576 
Cys Ser Asp Thr 
190 

ttt act etc tea 624 

Phe Thr Leu Ser 

205 

att ttt gca gcg 672 
He Phe Ala Ala 

gee ttt tct acg 720 
Ala Phe Ser Thr 
240 

gga acc etc ttc 768 
Gly Thr Leu Phe 
255 

gag gag tec aaa 816 
Glu Glu Ser Lys 
270 

ctg aac cca ttg 864 

Leu Asn Pro Leu 

285 

ata caa caa atg 912 
He Gin Gin Met 

948 



<210> 6 

<211> 918 

<212> DNA 

<213> Homo sapiens 



<400> 6 

atg tec aac aca aat ggc 
Met Ser Asn Thr Asn Gly 
1 5 
etc aca gat tgc ccg gaa 
Leu Thr Asp Cys Pro Glu 
20 

gtt gtt tac etc gtc acc 
Val Val Tyr Leu Val Thr 
35 

atg aga ctg gac tct cgc 
Met Arg Leu Asp Ser Arg 
50 

aac tta gee ttt gtg gat 
Asn Leu Ala Phe Val Asp 
65 70 
atg teg act aat ate gta 
Met Ser Thr Asn He Val 
85 

ttt aca cag tgc tac att 
Phe Thr Gin Cys Tyr He 
100 

atg ctg gca gca atg gec 
Met Leu Ala Ala Met Ala 
115 

ctg cgc tac agt gtg aaa 
Leu Arg Tyr Ser Val Lys 
130 

aca ttt ccc tat gtc tat 



agt gca ate aca gaa ttc 
Ser Ala He Thr Glu Phe 
10 

etc cag tct ctg ctt ttt 
Leu Gin Ser Leu Leu Phe 
25 

ctg eta ggc aac ctg ggc 
Leu Leu Gly Asn Leu Gly 
40 

ctt cac acg ccc atg tac 
Leu His Thr Pro Met Tyr 
55 60 
ttg tgc tat aca tea aat 
Leu Cys Tyr Thr Ser Asn 
75 

tct gag aag acc att tec 
Ser Glu Lys Thr He Ser 
90 

ttc att gee ctt eta etc 
Phe He Ala Leu Leu Leu 
105 

tat gac cgc tat gtg gee 
Tyr Asp Arg Tyr Val Ala 
120 

acg tec agg aga gtt tgc 
Thr Ser Arg Arg Val Cys 
135 140 
ggc ttc tea gat gga etc 



att tta ctt ggg 48 
He Leu Leu Gly 
15 

gtg ctg ttt ctg 96 
Val Leu Phe Leu 
30 

atg ata atg tta 144 

Met He Met Leu 

45 

ttc ttc etc act 192 
Phe Phe Leu Thr 

gca acc ccg cag 240 
Ala Thr Pro Gin 
80 

ttt get ggt tgc 288 
Phe Ala Gly Cys 
95 

act gag ttt tac 336 
Thr Glu Phe Tyr 
110 

ata tat gac cct 384 

He Tyr Asp Pro 

125 

ate tgc ttg gec 432 
He Cys Leu Ala 

ttc cag gec ate 480 
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Thr Phe Pro Tyr Val Tyr 
145 150 
ctg acc ttc cgc ctg acc 
Leu Thr Phe Arg Leu Thr 
165 

tac tgt get gac ccg ccg 
Tyr Cys Ala Asp Pro Pro 
180 

gtc aaa gag cat gec atg 
Val Lys Glu His Ala Met 
195 

tec etc acc ate gtc ttg 
Ser Leu Thr He Val Leu 
210 

etc egg ate aaa tea gca 
Leu Arg He Lys Ser Ala 
225 230 
ggt tec cat atg atg get 
Gly Ser His Met Met Ala 
245 

atg tat ata aga cca cca 
Met Tyr He Arg Pro Pro 
260 

ata get gtc ttt tac acc 
He Ala Val Phe Tyr Thr 
275 

tac agt ctg agg aat aaa 
Tyr Ser Leu Arg Asn Lys 

290 
aga tga 
Arg * 
305 



50 

Gly Phe Ser Asp Gly Leu 
155 

ttc tgt aga tec aat gtc 
Phe Cys Arg Ser Asn Val 
170 

etc att aag ctt tct tgt 
Leu He Lys Leu Ser Cys 
185 

ttc ata tct. get ggc ttc 
Phe He Ser Ala Gly Phe 
200 

gtg tec tat gec ttc att 
Val Ser Tyr Ala Phe He 
215 220 
gag gga agg cac aag gca 
Glu Gly Arg His Lys Ala 
235 

gtc acc ctg ttt tat ggg 
Val Thr Leu Phe Tyr Gly 
250 

aca gat aag act gtt gag 
Thr Asp Lys Thr Val Glu 
265 

ttt gtg agt ccg gta ctt 
Phe Val Ser Pro Val Leu 
280 

gat gtg aag cag gec ttg 
Asp Val Lys Gin Ala Leu 
295 300 
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Phe Gin Ala He 
160 

ate aac cac ttc 528 
He Asn His Phe 
175 

tct gat act tat 576 
Ser Asp Thr Tyr 
190 

aac etc tec age 624 

Asn Leu Ser Ser 

205 

ctt get gec ate 672 
Leu Ala Ala He 

ttc tec acc tgt 720 
Phe Ser Thr Cys 
240 

act etc ttt tgc 768 
Thr Leu Phe Cys 
255 

gaa tct aaa ata 816 
Glu Ser Lys He 
270 

aat cca ttg ate 864 

Asn Pro Leu He 

285 

aag aat gtc ctg 912 
Lys Asn Val Leu 

918 



<210> 7 

<211> 612 

<212> DNA 

<213> Homo sapiens 

<400> 7 

atg gtt aga gga aat tct 
Met Val Arg Gly Asn Ser 
1 5 
tta aag gat ctt cca gag 
Leu Lys Asp Leu Pro Glu 
20 

eta ate tac ctg ate act 
Leu He Tyr Leu He Thr 
35 

ate agg ata gat tea cgc 
He Arg He Asp Ser Arg 
50 

agt ttg tec tgc ttg gat 
Ser Leu Ser Cys Leu Asp 
65 70 
atg ttg gtg aac ttc ttc 
Met Leu Val Asn Phe Phe 
85 

tgt tta gtc cag tgc tat 
Cys Leu Val Gin Cys Tyr 
100 

tat atg eta get gta atg 
Tyr Met Leu Ala Val Met 
115 



act ttg gtg acg gaa ttt 
Thr Leu Val Thr Glu Phe 
10 

ctt cag ccc ate etc ttt 
Leu Gin Pro He Leu Phe 
25 

gtc ggg ggg aac ctt ggg 
Val Gly Gly Asn Leu Gly 
40 

etc cac acc ccc atg tat 
Leu His Thr Pro Met Tyr 
55 60 
ttg tat tac tec act aat 
Leu Tyr Tyr Ser Thr Asn 
75 

tea gac aag aaa gee att 
Ser Asp Lys Lys Ala He 
90 

ttt ttc att get gtg gtg 
Phe Phe He Ala Val Val 
105 

gee tat gat agg tat gtg 
Ala Tyr Asp Arg Tyr Val 
120 



att etc ttg gga 48 
He Leu Leu Gly 
15 

gta ctg ttc ctg 96 
Val Leu Phe Leu 
30 

atg ttg gtg ttg 144 

Met Leu Val Leu 

45 

ttc ttt ctt get 192 
Phe Phe Leu Ala 

gtg act ccc aag 24 0 

Val Thr Pro Lys 
80 

tec tat get get 28 8 

Ser Tyr Ala Ala 
95 

att act gaa tat 336 
He Thr Glu Tyr 
110 

gec ate tgt aac 3 84 

Ala He Cys Asn 

125 
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cct 




ctt 


tac 


age 


age 


Piro 


Leu 


Leu 


Tvr 


Ser 


Ser 




130 










att 


get 




cca 


tat 


gtc 


lie 


Ala 


Glv 


Pro 


i yj. 


Val 


145 










150 


atg 


tgg 


aca 


tac 


cac 


ttg 


Met 


Trp 


Thr 


Tyr 


His 


Leu 










165 




ttc 


tac 


tgt 


get 


gac 


cca 


Phe 


Tyr 


Cys 


Ala 


Asp 


Pro 








180 






ttc 


att 


aag 


gaa 


aca 


tec 


Phe 


lie 


Lys 


Glu 


Thr 


Ser 



195 



51 

aag atg tec aaa ggg etc 
Lys Met Ser Lys Gly Leu 
135 140 
tat ggg ttt ctt agt gga 
Tyr Gly Phe Leu Ser Gly 
155 

acc ttc tgt ggc tec aat 
Thr Phe Cys Gly Ser Asn 
170 

ccc etc ate cga ctt tec 
Pro Leu lie Arg Leu Ser 
185 

atg ttt gtg gta gca tga 
Met Phe Val Val Ala * 
200 
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tgt att cgc ctg 432 
Cys lie Arg Leu 

ctg atg gaa acc 480 
Leu Met Glu Thr 
160 

ate att aat cac 528 
lie lie Asn His 
175 

tgc tct gac act 576 
Cys Ser Asp Thr 
190 

612 



<210> 8 

<211> 807 

<212> DNA 

<213> Homo sapiens 

<400> 8 ' 

ttg ccc tea tec agg cca acg ccc egg etc cac acg ccc atg tac ttt 48 
Leu Pro Ser Ser Arg Pro Thr Pro Arg Leu His Thr Pro Met Tyr Phe 
15 10 15 

ttc ctg age aac tta tec ttt gtg gat ctg tgc ttc tct tec aat gtg 96 
Phe Leu Ser Asn Leu Ser Phe Val Asp Leu Cys Phe Ser Ser Asn Val 

20 25 30 

act cca agg atg ctg gag att ttc ctt tea gag aag aaa age att tec 144 
Thr Pro Arg Met Leu Glu lie Phe Leu Ser Glu Lys Lys Ser lie Ser 

35 40 45 

tat cct gee cgt ctt gtg cag tgt tac ctt ttt ate acc ttg gtc cac 192 
Tyr Pro Ala Arg Leu Val Gin Cys Tyr Leu Phe lie Thr Leu Val His 

50 55 60 

gtt gag etc tac ate ctg get gtg atg gee ttt gac egg tac atg gec 240 
Val Glu Leu Tyr He Leu Ala Val Met Ala Phe Asp Arg Tyr Met Ala 
65 70 75 80 

ate tgc aac cct ctg ctt tat ggc age aga atg tec aag age gtg tgc 288 
He Cys Asn Pro Leu Leu Tyr Gly Ser Arg Met Ser Lys Ser Val Cys 

85 90 95 

tct ttc etc ate aca gtg ctt tat gtg tat gga gca etc act ggc ctg 336 
Ser Phe Leu He Thr Val Leu Tyr Val Tyr Gly Ala Leu Thr Gly Leu 

, 100 105 HO 

atg gag act atg tgg acc tac aac eta gec ttc tgt ggc ccc agt gaa 384 
Met Glu Thr Met Trp Thr Tyr Asn Leu Ala Phe Cys Gly Pro Ser Glu 

115 120 125 

att aat cac ttc tac tgt gtg gac cca cca ctg att aag ctg get tgt 432 
He Asn His Phe Tyr Cys Val Asp Pro Pro Leu He Lys Leu Ala Cys 

130 135 140 

tct gac acc tac aac aag gag gtg tea atg ttt gtt gtg get ggt ttc 480 
Ser Asp Thr Tyr Asn Lys Glu Val Ser Met Phe Val Val Ala Gly Phe 
145 150 155 160 

aac ttc act tat cct etc ctt ate ate etc att tec tat etc tac ata 528 
Asn Phe Thr Tyr Pro Leu Leu He He Leu He Ser Tyr Leu Tyr He 

165 170 175 

ttt cct gec acc eta agg ate tgc tct aca gaa ggc agg cac aaa get 5 76 

Phe Pro Ala Thr Leu Arg lie Cys Ser Thr Glu Gly Arg His Lys Ala 

180 185 190 

ttt tct acc tgt ggc tec cat ctg aca gec gtt act att ttc tat tea 624 
Phe Ser Thr Cys Gly Ser His Leu Thr Ala Val Thr He Phe Tyr Ser 

195 200 205 

get ctt ttc ttc atg tat etc aga cgt cca tea gaa gag tec atg gag 672 
Ala Leu Phe Phe Met Tyr Leu Arg Arg Pro Ser Glu Glu Ser Met Glu 
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52 

210 215 220 

cag ggg aaa atg gta get gta ttt tat acc act gta ate ccc atg ttg 72 0 

Gin Gly Lys Met Val Ala Val Phe Tyr Thr Thr Val He Pro Met Leu 
225 230 235 240 

aat ccc atg ate tac agt ctg agg aac aaa gat gtg aaa gag gca tta 768 
Asn Pro Met He Tyr Ser Leu Arg Asn Lys Asp Val Lys Glu Ala Leu 

245 250 255 

tgc aaa gaa ctg ttc aaa aga aaa ttg ttt tct aaa taa 
Cys Lys Glu Leu Phe Lys Arg Lys Leu . Phe Ser Lys * 
260 265 



807 



<210> 9 

<211> 363 

<212> DNA 

<213> Homo sapiens 



<400> 9 



atg 


aga 


agg 


aac 


ttc 


acg 


ttg 


gtg 


act 


gag 


ttc 


att 


etc 


ctg 


gga 


ctg 


48 


Met 


Arg 


Arg 


Asn 


Phe 


Thr 


Leu 


Val 


Thr 


GlU 


Phe 


He 


Leu 


Leu 


Gly 


Leu 




1 






5 










10 










15 






acg 


aat 


cac 


cag 


gaa 


tta 


cag 


att 


etc 


etc 


ttc 


atg 


ctg 


ttt 


ctg 


gee 


96 


Thr 


Asn 


His 


Gin 
20 


Glu 


Leu 


Gin 


He 


Leu 
25 


Leu 


Phe 


Met 


Leu 


Phe 
30 


Leu 


Ala 




att 


tac 


atg 


gtc 


aca 


gtg 


gca 


ggg 


aat 


ctt 


age 


atg 


att 


gee 


etc 


ate 


144 


He 


Tyr 


Met 


Val 


Thr 


Val 


Ala 


Gly Asn 


Leu 


Ser 


Met 


He 


Ala 


Leu 


He 






35 










40 










45 










cag 


gec 


aat 


gec 


egg 


etc 


cac 


acg 


ccc 


atg 


tac 


ttt 


ttc 


ctg 


age 


cac 


192 


Gin 


Ala 
50 


Asn 


Ala 


Arg 


Leu 


His 
55 


Thr 


Pro 


Met 


Tyr 


Phe 
60 


Phe 


Leu 


Ser 


His 




tta 


tec 


ttc 


ctg 


gat 


ctg 


tgc 


ttc 


tct 


tec 


aat 


gtg 


acc 


cca 


aag 


atg 


240 


Leu 


Ser 


Phe 


Leu 


Asp 


Leu 


Cys 


Phe 


Ser 


Ser 


Asn 


Val 


Thr 


Pro 


Lys 


Met 




65 










70 










75 










80 




ctg 


gag 


att 


ttc 


ctt 


tea 


gag 


aag 


aaa 


age 


att 


tec 


tat 


cct 


gee 


tgt 


288 


Leu 


Glu 


He 


Phe 


Leu 
85 


Ser 


Glu 


Lys 


Lys 


Ser 
90 


He 


Ser 


Tyr 


Pro 


Ala 
95 


Cys 




ctt 


gtt 


cag 


tgt 


tac 


ctt 


tat 


ate 


ate 


ttg 


gta 


cac 


gtt 


gag 


ate 


tac 


336 


Leu 


Val 


Gin 


Cys 
100 


Tyr 


Leu 


Tyr 


He 


He 
105 


Leu 


Val 


His 


Val 


Glu 
110 


He 


Tyr 




ate 


ctg 


get 


gtg 


atg 


gee 


ttt 


gac 


tag 
















363 


He 


Leu 


Ala 
115 


Val 


Met 


Ala 


Phe 


Asp 
120 


* 



















<210> 10 

<211> 936 

<212> DNA 

<213> Homo sapiens 

<400> 10 

atg aga aga aac tgc acg 
Met Arg Arg Asn Cys Thr 
1 5 
acc agt cgc egg gaa tta 
Thr Ser Arg Arg Glu Leu 
20 

att tac atg gtc acg gtg 
He Tyr Met Val Thr Val 
35 

cag gee aac gee tgg etc 
Gin Ala Asn Ala Trp Leu 
50 

tta tec ttc gtg gat ctg 
Leu Ser Phe Val Asp Leu 
65 70 



ttg gtg act gag ttc att 
Leu Val Thr Glu Phe He 
10 

caa att etc etc ttc acg 
Gin He Leu Leu Phe Thr 
25 

gca ggg aac ctt ggc atg 
Ala Gly Asn Leu Gly Met 
40 

cac atg ccc atg tac ttt 
His Met Pro Met Tyr Phe 
55 60 
tgc ttc tct tec aat gtg 
Cys Phe Ser Ser Asn Val 
75 



etc ctg gga ctg 48 
Leu Leu Gly Leu 
15 

ctg ttt ctg gec 96 
Leu Phe Leu Ala 
30 

att gtc etc ate 144 

He Val Leu He 

45 

ttc ctg age cac 192 
Phe Leu Ser His 

act cca aag atg 240 
Thr Pro Lys Met 
80 
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ctg gag att ttc ctt tea 
Leu Glu lie Phe Leu Ser 
85 

ctt gtg cag tgt tac ctt 
Leu Val Gin Cys Tyr Leu 
100 

ate ctg get gtg atg gec 
lie Leu Ala Val Met Ala 
115 

ctg ctt tat ggc age aga 
Leu Leu Tyr Gly Ser Arg 
130 

acg gtg cct tat gtg tat 
Thr Val Pro Tyr Val Tyr 
145 150 
tgg ace tac aac eta gee 
Trp Thr Tyr Asn Leu Ala 
165 

tac tgt gcg gac cca cca 
Tyr Cys Ala Asp Pro Pro 
180 

aac aag^ gag ttg tea atg 
Asn Lys Glu Leu Ser Met 
195 

tct etc ttc ate ata tgt 
Ser Leu Phe lie lie Cys 
210 

tta aag att cgc tct aca 
Leu Lys lie Arg Ser Thr 
225 230 
ggc tec cat ctg aca get 
Gly Ser His Leu Thr Ala 
245 

atg tat etc aga ccc ccc 
Met Tyr Leu Arg Pro Pro 
260 

gta get gta ttt tat ace 
Val Ala Val Phe Tyr Thr 
275 

tat age ctt aga aat aaa 
Tyr Ser Leu Arg Asn Lys 
290 

tea atg aag ata tac ttt 
Ser Met Lys lie Tyr Phe 
305 310 



53 

gag aag aaa age att tec 
Glu Lys Lys Ser lie Ser 
90 

ttt ate gee ttg gtc cat 
Phe lie Ala Leu Val His 
105 

ttt gac egg tac atg gee 
Phe Asp Arg Tyr Met Ala 
120 

atg tec aag agt gtg tgc 
Met Ser Lys Ser Val Cys 
135 140 
gga gcg etc act ggc ctg 
Gly Ala Leu Thr Gly Leu 
155 

ttc tgt ggc ccc aat gaa 
Phe Cys Gly Pro Asn Glu 
170 

ctg att aag ctg get tgt 
Leu lie Lys Leu Ala Cys 
185 

ttt att gtg get ggc tgg 
Phe He Val Ala Gly Trp 
200 

att tec tac ctt tac att 
He Ser Tyr Leu Tyr He 
215 220 
gag ggc agg caa aaa get 
Glu Gly Arg Gin Lys Ala 
235 

gtc act ata ttc tat gca 
Val Thr He Phe Tyr Ala 
250 

tea aag gaa tct gtt gaa 
Ser Lys Glu Ser Val Glu 
265 

aca gta ate cct atg ctg 
Thr Val He Pro Met Leu 
280 

aat gta aaa gaa gca tta 
Asn Val Lys Glu Ala Leu 
295 300 
tct taa 
Ser * 
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tat cct gee tgt 288 
Tyr Pro Ala Cys 
95 

gtt gag ate tac 336 
Val Glu He Tyr 
110 

ate tgc aac cct 384 

He Cys Asn Pro 

125 

tec ttc etc ate 432 
Ser Phe Leu He 

atg gag ace atg 4 80 

Met Glu Thr Met 
160 

att aat cac ttc 52 8 

He Asn His Phe 
175 

tct gac ace tac 576 
Ser Asp Thr Tyr 
190 

aac ctt tct ttt 624 

Asn Leu Ser Phe 

205 

ttc cct get att 672 
Phe Pro Ala He 

ttt tct acc tgt 720 
Phe Ser Thr Cys 
240 

acc ctt ttc ttc 768 
Thr Leu Phe Phe 
255 

cag ggt aaa atg 816 
Gin Gly Lys Met 
270 

aac ctt ata att 864 

Asn Leu He He 

285 

ate aaa gag ctg 912 
He Lys Glu Leu 

936 



<210> 11 

<211> 180 

<212> DNA 

<213> Homo sapiens 



<400> 11 



atg 


tec 


aga 


aga 


aac 


tat 


act 


gaa 


ctg 


aca 


gaa 


ttt 


gtt 


etc 


ttg 


ggt 


48 


Met 


Ser 


Arg 


Arg 


Asn 


Tyr 


Thr 


Glu 


Leu 


Thr 


Glu 


Phe 


val 


Leu 


Leu 


Gly 




1 








5 










10 










15 






eta 


aca 


age 


cgt 


cca 


gag 


ctg 


cga 


gtt 


get 


ttc 


ttg 


gca 


ctg 


ttc 


ctt 


96 


Leu 


Thr 


Ser 


Arg 
20 


Pro 


Glu 


Leu 


Arg 


Val 
25 


Ala 


Phe 


Leu 


Ala 


Leu 
30 


Phe 


Leu 




ttt 


gtc 


tac 


ata 


gee 


act 


gtg 


gta 


gga 


aac 


ttg 


ggg 


atg 


att 


att 


tta 


144 


Phe 


Val 


Tyr 


He 


Ala 


Thr 


Val 


val 


Gly Asn 


Leu 


Gly Met 


He 


He 


Leu 








35 










40 










45 










ate 


aaa 


gtt 


gat 


tct 


cga 


ctt 


cac 


act 


ccc 


atg 


taa 










180 


He 


Lys 


Val 


Asp 


Ser 


Arg 


Leu 


His 


Thr 


Pro 


Met 


* 
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50 



55 



54 

60 
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<210> 12 
<211> 64 
<212> PRT 

<213> Homo sapiens 
<400> 12 

Met Ser Phe Leu lie Arg Ser Asp 
1 5 
Leu Phe Leu Ser His Leu Ser Phe 
20 

Ala Thr Pro Pro Met Leu Val Asn 

35 40 
Phe Pro Leu Leu Val Ala Leu Ser 
50 55 

<210> 13 

<211> 315 

<212> PRT 

<213> Homo sapiens 



Ser Thr Leu His Thr Pro Met Cys 

10 15 
Val Asp Leu Tyr Tyr Ala Thr Asn 
25 30 
Phe Phe Phe Pro Arg Glu Lys Pro 
45 

Asn Phe Thr Phe Ser Leu His Trp 
60 



<400> 13 



Met 


Phe 


Ser 


Pro 


Asn 


His 


Thr 


lie 


Val 


Thr 


Glu 


Phe 


He 


Leu 


Leu 


Gly 


1 








5 










10 










15 




Leu 


Thr 


Asp 


Asp 


Pro 


val 


Leu 


Li-LU 


Lys 


He 


Leu 


Phe 


Gly Val 


Phe 


Leu 








20 










25 










30 






Ala 


He 


Tyr 


Leu 


He 


Thr 


Leu 


ax a 


Gly 


Asn 


Leu 


Cys 


Met 


He 


Leu 


Leu 






35 










4 (J 










45 








lie 


Arg 


Thr 


Asn 


Ser 


His 


Leu 




Thr 


Pro 


Met 


Tyr 


Phe 


Phe 


Leu 


vjXy 




50 










55 










60 










TT -I ~ 
X1J.S 


Leu 


Q -v- 


Phe 


Val 


Asp 


He 


Cys 


Tyr 


Ser 


Ser 


Asn 


Val 


Thr 


Pro 


Asn 


65 










70 










75 










80 


Met 


Leu 


His 


Asn 


Phe 


Leu 


Ser 


Glu 


Gin 


Lys 


Thr 


He 


Ser 


Tyr 


Ala 


Gly 










85 










90 










95 




Cys 


Phe 


Thr 


Gin 


Cys 


Leu 


Leu 


Phe 


He 


Ala 


Leu 


Val 


He 


Thr 


Glu 


Phe 






100 










105 










110 






Tyr 


Phe 


Leu 


Ala 


Ser 


Met 


Ala 


Leu 


Asp 


Arg 


Tyr 


Val 


Ala 


He 


Cys 


Ser 




115 
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Arg 


Tyr 


Met 


Ala 


He 


Cys 


Asn 


Pro 






115 










120 










125 








Leu 


Leu 


Tyr 


Gly 


Ser 


Arg 


Met 


Ser 


Lys 


Ser 


Val 


Cys 


Ser 


Phe 


Leu 


He 




130 










135 










140 










Thr 


Val 


Pro 


Tyr 


Val 


Tyr Gly Ala 


Leu 


Thr 


Gly 


Leu 


Met 


Glu 


Thr 


Met 


145 










150 










155 










160 


Trp 


Thr 


Tyr 


Asn 


Leu 


Ala 


Phe 


Cys 


Gly 


Pro 


Asn 


Glu 


He 


Asn 


His 


Phe 








165 










170 










175 




Tyr 


Cys 


Ala 


Asp 


Pro 


Pro 


Leu 


He 


Lys 


Leu 


Ala 


Cys 


Ser 


Asp 


Thr 


Tyr 






180 










185 










190 






Asn 


Lys 


Glu 


Leu 


Ser 


Met 


Phe 


He 


Val 


Ala 


Gly 


Trp 


Asn 


Leu 


Ser 


Phe 
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195 200 205 

Ser Leu Phe lie lie Cys lie Ser Tyr Leu Tyr lie Phe Pro Ala He 

210 215 220 

Leu Lys He Arg Ser Thr Glu Gly Arg Gin Lys Ala Phe Ser Thr Cys 
225 230 235 240 

Gly Ser His Leu Thr Ala Val Thr He Phe Tyr Ala Thr Leu Phe Phe 

245 250 255 

Met Tyr Leu Arg Pro Pro Ser Lys Glu Ser Val Glu Gin Gly Lys Met 

260 265 270 

Val Ala Val Phe Tyr Thr Thr Val He Pro Met Leu Asn Leu He He 

275 280 285 

Tyr Ser Leu Arg Asn Lys Asn Val Lys Glu Ala Leu He Lys Glu Leu 

290 295 300 

Ser Met Lys He Tyr Phe Ser 
305 310 

<210> 21 

<211> 59 

<212> PRT 

<213> Homo sapiens 

<400> 21 

Met Ser Arg Arg Asn Tyr Thr Glu Leu Thr Glu Phe Val Leu Leu Gly 

15 10 15 

Leu Thr Ser Arg Pro Glu Leu Arg Val Ala Phe Leu Ala Leu Phe Leu 

20 25 30 

Phe Val Tyr He Ala Thr Val Val Gly Asn Leu Gly Met He He Leu 

35 40 45 

He Lys Val Asp Ser Arg Leu His Thr Pro Met 
50 55 

<210> 22 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<400> 22 

cctggagggt ttcaaaggct gatactttag 3 0 

<210> 23 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<400> 23 

ctccagcctg agcaacagag caatac 26 

<210> 24 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<400> 24 

ctcacattca ttgttcttca cagacccagc 3 0 

<210> 25 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<400> 25 

ccctgctggg atctggatca agac 24 
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<210> 26 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerPU 
<400> 26 

tgtaaaacga cggccagt 19 

<210> 27 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerRP 
<400> 27 

caggaaacag ctatgacc 18 
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1. Claims: 1, 2, 7, 13, 16-24 (all part) 

Olfactory receptor genomic sequence SEQ ID 1 

2. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 2 and 
12) 

3. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 3 and 
13) 

4. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 4 and 
14) 

5. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 5 and 
15) 

6. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 6 and 
16) 

7. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 7 and 
17) 

8. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 8 and 
18) 

9. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 9 and 
19) 

10. Claims: 1, 2, 7, 13-26, 35-37 (all part) 
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0LF1 polypeptide and encoding polynucleotide (SEQ IDs 10 and 



11. Claims: 1, 2, 7, 13-26, 35-37 (all part) 

0LF1 polypeptide and encoding polynucleotide (SEQ IDs 11 and 
21) 

3&. Claims: claims 1-3, 5-10, 13-24 (all part) 
Groups 12-24: 

Polynucleotides of 8-50 bp in length from SEQ ID 1, 
including an olfactory related biallelic marker selected 
from A1-A13 

J5 

l£. Claims: 1-3, 5, 7-10, 13-24 (all part) 
Group 25: 

Polynucleotides of 8-50 bp in length from SEQ ID 1 including 
an olfactory related biallelic marker other than A1-A13. 
Note that in this group further non-unity is present, since 
each group of fragments not linked to other by a coimion 
structural feature is regarded as a separate invention. 
Moreover, this group is not searchable since the markers are 
not defined. 

\4. Claims: 11,12 

Groups 26-29: Oligonucleotide primers D1-D13 (group 26), 
E1-E13 (group 27) and Bl-Bll (group 28) and Cl-Cll (group 29) 

30 

Claim : 1 (part) 

Polynucleotides encoding of at least 12 nucleotides of SEQ 
ID 1 as defined in claim 1 and not falling into another 
group- Note that in this group further non-unity is present, 
since each group of fragments not linked to other by a 
comnon structural feature is regarded as a separate 
invention, e.g. the claims define 114448 separate 12mer 
oligonucleotide fragments which each may be a separate group. 
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